Examples

Pre-calculated examples

Table 1. Detection of binding sites on a non-redundant (< 30 seq. id.) test set of PDB structures. Residues with structural similarity scores of 7, 8, and 9 are considered as binding sites. SP is specificity; SE is sensitivity; P-value is significance of prediction.
PDB Code
and
Chain ID
Total
num.
residues
Interface
num.
residues
ProBiS detected united binding site Sensitivites for different binding sites types
P-value Num.
residues
SP SE Small-
ligand
Protein-
protein
DNA Other
1all.A160682.20E-06480.7080.5000.4830.523--
1apm.E338572.50E-071010.3370.5960.7740.525--
1aze.A56237.60E-02150.6000.391-0.250-0.714
1bnc.A433751.60E-061230.3170.5200.7880.310--
1daa.A277822.10E-02700.4000.3410.8850.143--
1dfj.E124464.60E-02360.5000.3911.0000.378--
1dfj.I456491.60E-011170.1370.327-0.327--
1efu.A364925.50E-141120.5180.6301.0000.590--
1g3n.A293897.00E-01870.2870.2810.5940.154--
1g3n.B155281.00E+00460.0430.071-0.071--
1g3n.C233386.60E-03620.2740.447-0.447--
1got.A326632.90E-05980.3370.5241.0000.167--
1got.B3391239.90E-01930.2690.203-0.203--
1hcg.A229591.10E-01690.3190.3730.6150.182--
1k9o.E223379.20E-04680.2940.541-0.541--
1k9o.I376941.30E-05980.4180.436-0.436--
1lw6.I63185.50E-01130.3080.222-0.222--
1rrp.A2041004.30E-01610.5080.3100.8480.083--
1tco.B1691081.90E-05450.8890.3700.7910.129--
1ugh.E223358.60E-04410.3410.400-0.400--
1ytf.A180592.20E-08510.6470.559-0.5000.630-
1ak4.A165221.80E-07450.380.77-0.77--
1ay7.A96214.00E-06240.580.67-0.67--
1bvn.P496602.00E-071410.250.580.830.5-1
1efn.B104251.30E-05290.550.64-0.64--
1fqj.B133312.50E-04370.460.55-0.55--
1gcq.C69309.60E-01200.30.2-0.2--
1gpw.A253684.50E-03710.390.410.390.43--
1gpw.B200288.60E-02600.20.43-0.43--
1grn.A191482.60E-09560.550.650.850.48--
1grn.B197446.30E-03570.350.46-0.46--
1h1v.A3681241.90E-01980.380.30.820.08--
1h1v.G327672.00E-05950.360.510.540.47--
1ktz.A82318.10E-02280.50.45-0.45--
1pxv.A183401.20E-03530.380.5-0.5--
1pxv.C111311.40E-03320.50.52-0.52--
1r8s.E187474.60E-02480.350.360.640.36--
1s1q.A137221.00E+003600-0--
1udi.E227341.20E-05650.320.62-0.62--

Heterotrimeric G protein

Heterotrimeric G proteins relay hormonal signals from transmembrane receptors to intracellular effectors. They consist of three subunits: alpha, beta, and gamma (see Fig. 1). When a hormone reacts with its extracellular receptor, the intracellular membrane-bound G protein in turn binds to this hormone-receptor complex. The GDP which is bound to the G protein at rest is replaced by the GTP &bull Mg2+ complex, while the beta-gamma subunit splits off. The free alpha subunit then activates (or inhibits) various secondary effectors.

The alpha subunit itself has two domains. One adopts a transducin insertion domain fold and the other a P-loop containing nucleoside triphosphate hydrolase fold (SCOP classification of protein structures) (see Fig. 1). P-loop or a phosphate-binding loop is an ATP or GTP- binding site motif found in many nucleotide-binding proteins. It is a glycine-rich loop preceded by a beta sheet and followed by an alpha helix. It interacts with the nucleotide phosphate groups and with the Mg2+ ion that coordinates the β- and γ-phosphates in GTP. Upon nucleotide hydrolysis the P-loop does not significantly change conformation, but stays bound to the remaining phosphate groups. Conversely, the alpha-beta binding site region does change its conformation significantly in this process (encircled in Fig. 1).

In our study, we used the crystal structure of the alpha subunit (PDB ID: 1got, Chain Id: a) bound to the beta-gamma subunit and to a GDP molecule as the input structure for the ProBiS algorithm. Due to prominent flexibility and the presence of several binding sites this structure is ideal for testing ProBiS's ability to detect binding sites. It was compared to nearly 23,000 protein structures in the current non-redundant protein database (nr-PDB), from which ProBiS retrieved 408 locally similar protein structures. Binding sites predictions were obtained by calculating the structural conservation scores for the query protein residues based on these retrieved structures.

We can observe the conformational changes in the protein-protein binding site if we superimpose the bound structure (1got-a) onto the homologous (>90% sequence identity) but unbound alpha subunit (1tad-b) (see Fig. 2; the flexible segments are red and dark blue). The loop/alpha helix residues Val196-Glu212 (referred to as switch II) adopt very different conformations in these two structures. Upon binding, the alpha helix partially unfolds between Arg201 and Arg204 and the whole segment tilts up to 8 Å away from the protein core and towards the beta-gamma subunit. This move is accompanied by the shift of nearby residues Val175-Ile180 (referred to as switch I) by around 5 Å. Since structures of alpha subunits in PDB may be bound or unbound, it is necessary to be able to find similarities in flexible segments when comparing these structures.

Retrieved structures resemble the query protein to different extents: some resemble both alpha subunit domains (Fig. 3A), some resemble only one (Fig. 3B), while others share no fold similarities to either of the two domains, except for similar binding residues (Fig. 3C and D). The structurally aligned residues which are usually present in the GDP binding region are colored gold in Fig. 3. Although the two protein backbones could at times be aligned, e.g. Fig. 3B, the strongest structural similarity is usually restricted to binding site residues. This is the main difference between local and global alignment of proteins; the latter seek similarity in protein backbones. In contrast, ProBiS aligns proteins on the most similar local surface interaction patterns. Not even small segments of backbone need to be similar. ProBiS can make use of these locally similar motifs to detect structurally conserved binding sites. Even though in minority, the detected cross-fold structural similarities (Fig. 3C and D) are important contributors to the final conservation scores and present interesting cases for further discussion.

Exploring structures with decreasing fold similarities to the query we find structures which share only local similarities with our query protein. These similarities are detected especially in binding sites as in the case presented in Fig. 3C. A detailed view of the aligned binding site region is shown in Fig. 4. Here, a similarity between the GDP binding site in the alpha subunit (query) and the ADP binding site of myosin VI (2vb6-a) was detected. The two compared proteins are non-homologous (sequence alignment with BLAST yields E-value of 0.74 and global structural alignment with MAMMOTH produces no meaningful alignment), yet both possess a conserved P-loop which is responsible for the binding of nucleotide phosphate groups. The phosphates are superimposed by the transformation and adopt exactly the same conformation in both structures, as shown in Fig. 4. The Mg2+ ion in the aligned structure is also shown.

Perhaps the most interesting are structures which are not fold similar to the query, but have through the evolution converged to a similar binding site. However, false positives which are due to chance similarities in non related proteins may be mistaken for a real binding site similarity in this case. It is difficult to distinguish real from artifact similarities if no ligands are present in the structure or the binding site location is not known. However, the alignment scores (especially E-value), which ProBiS assigns to each local alignment, try to foresee false positives. In one of the retrieved structures (1od6-a, Fig. 3D) we could confirm that

a similar ligand (ADP) binds to it as in our query (GDP). Phosphopantetheine adenylyltransferase, which by the SCOP classification adopts an adenine nucleotide alpha hydrolase-like fold (violet in Figs. 3 and 5) is aligned to the query structure, G protein alpha subunit, which by SCOP classification adopts a P-loop containing nucleoside triphosphate hydrolase fold (blue in Figs. 2 and 4). The E-value for this alignment is 1.1 ⋅ 10-7. The detailed view of the aligned binding site region is shown in Fig. 5, where ADP is not shown because it was not present in the crystal structure. The similarity between binding sites is due to a similar function of the two compared proteins, i.e., phosphate binding. Our attention was first attracted by the SO42- ion, which aligns almost perfectly with the the terminal phosphate group in GDP from the query structure (see Fig. 5). Subsequent comparison with related PDB entries (e.g., 1gn8-a), revealed that the SO42- position is usually occupied by an ADP phosphate group. The alignment includes residues in the P-loop and alpha helix preceding the loop in the alpha subunit (query structure) with the alpha helix in the similar structure.

We consider all residues with local structural conservation scores of 7 or more as the prediction of binding sites. Using this definition, the GDP (and Mg2+) binding site on the alpha subunit was predicted with the sensitivity of 0.852. In other words, 85.2% of the binding site was found to be structurally conserved (see Fig. 6 and zoom-in on the most conserved GDP binding site in Fig. 7). Not surprisingly, the P-loop with sequence GAGESGKS (residues Gly36-Ser43) is designated as the most structurally conserved part of this protein (see Fig. 7). These so called fingerprint residues have been found to be conserved in the best structural alignments (with highest alignment lengths to the query) and are used to scan the remaining alignments for similar motifs. Structures having the same fingerprint motif are thereby retrieved. Fingerprints are vertical stripes of residues highlighted with red color in the structure-based sequence alignment (see Figs. 8 and 9). Another fingerprint residue, conserved in most of the retrieved proteins, is Asp196 (see also Fig. 9). The structurally conserved Asp196 is part of a conserved sequence motif, DXXG, found in all regulatory GTP binding proteins, and forms the amino-terminal segment of switch II. Aspartate 196 serves to connect the Mg2+ site to switch II via a water molecule that is bound to the Mg2+ ion.

The protein-protein binding site on the alpha subunit, i.e., the alpha-beta binding site which spans mainly the switch II region, was predicted with the sensitivity of 0.389. In other words, 38.9% of this binding site is structurally conserved (see Fig. 6). This binding site has a dual role: it binds the beta-gamma subunit as is the case in our query crystal structure, but it also interacts with adenylate cyclase after the beta-gamma subunit has split off (e.g., 1cul). Our observations concern only the alpha-(beta-gamma) binding. The considerably lower structural conservation compared with conservation in GDP binding site could be a consequence of the C-terminal alpha helix (blue on Fig. 5), which is also a part of alpha-beta binding site, being missing in most of the retrieved structures. On the other hand, in this study protein-protein binding sites have generally been found to be less well conserved than binding sites for small ligands. Still, residues Val197-Gln200 in the flexible part of the binding site are recognized as well conserved (structural conservation score: 9) and residues Arg201-Glu212 as moderately conserved (structural conservation score: 7-8) (see Figs. 5 and 9). To distinguish the conserved residues, which adopt different conformations (i.e., lie on flexible backbone segments) from those that adopt the same conformation (i.e., can be rigidly superimposed) in compared structures, they are colored gray and black in the structure-based sequence alignment, respectively (see Figs. 8 and 9).