abstract
| - In 2012, the Baker group published a paper in Nature entitled : "Principles for designing ideal protein structures” : here's the reference if you're comfortable with the rather dry language of academic biology . Based partly on known structures of naturally occurring proteins, and partly on large scale simulations of artificial proteins made by the volunteer contributors of Rosetta@home, the work laid down guidelines as to what kinds of secondary structure patterns would be most promising when designing new proteins. These rules have been explained before in a video by Susume here : sometimes though its easier to have a written reference for this kind of thing. So here's an attempt to summarize the main features of that paper: if when designing proteins you can follow these guidelines there's a much better chance of that design being interesting (not to mention achieving a higher score). In almost all the following cases it doesn't matter what kind of amino acids are present in any structure element. (exception : hairpin loops in FoldIt are much easier to construct if make the loops residues glycine): the mutate function may change them later. Here are the structure patterns: sheet - 2 residue loop - sheet: Sheet-loop-sheet motifs with the two sheets being adjacent and forming an anti-parallel arrangement are very common: key here is the number of residues in the loop which determines whether the second sheet goes to the left or right of the first sheet (coordinate system as defined in 1) below) 1) Arrange the first sheet (the one with lower residue numbers) so that the side chain of the last residue of the sheet points into the screen. 2) Then the second sheet should be to the LEFT of the first sheet. 3) When constructing this "hairpin" turn life is much easier if you mutate the two loop residues to glycine: also put a outpoint in the middle of the loop and local wiggle out the loop to get a reasonable Here sheet 1 has residues 55-59: residues 60 and 61 constitute the loop, and the second sheet has residues 62-66. Note that the side chain of residue 59 (Arginine) points into the page, indicating that the second sheet goes to the left: furthermore note that one of the loop residues (61) shows no side chain and is a glycine. In the case of this particular secondary structure sequence. the preference for going left might be considered an absolute rule rather than a guideline. In thousands of simulated cases where this motif occurred, and many thousands more in naturally occurring proteins, it looks from the paper as if there was not a single case of the second sheet being to the right when the sheets are joined by a 2-residue loop. Very similar to the previous case: the second sheet has a strong preference to go to to the left. 1) Arrange the first sheet (the one with lower residue numbers) so that the side chain of the last residue of the sheet points into the screen. 2) Then the second sheet should, as previously, be to the LEFT of the first sheet. 3) The loop here isn't quite as strained as in the 2-residue loop case above so glycines in the loop aren't a necessity during construction: mutate may still end up putting them there though. This preference is followed about 85% of the time (trying to eyeball the histograms in the paper here), both in naturally occurring proteins and in designed ones. This motif occurs much less frequently in natural proteins than does the 2-residue loop case (maybe 10-15% as common). It's 50/50 whether the second sheet goes right or left in both natural and artificial proteins, so you don't have to worry about it unduly. It occurs about twice as frequently in natural proteins as does the 3-residue loop case, but is still relatively uncommon relative to the 2-residue loop case. 1) Arrange the first sheet (the one with lower residue numbers) so that the side chain of the last residue of the sheet points into the screen. 2) Then the second sheet should be to the RIGHT of the first sheet. Sheet 1 has residues 4-8: residues 9 thru 13 make up the loop, and the second sheet has residues 14-18. Note that the side chain of residue 8 (Serine: barely visible)) points into the page, indicating that the second sheet goes to the right. This preference is followed about 70% of the time in designed proteins and about 95% of the time (trying to eyeball the histograms in the paper here), both in naturally occurring ones. It's also much more common in naturally occurring proteins than the 3 and 4 loop cases but still not as frequent as the 2-loop case. There are 2 preferred orientations for this setup. In both of them the helix is offset diagonally from the sheet: in one case it is in front of the sheet and slants to the right: in the other the helix goes behind the sheet and slants to the left. sheet - 2 residue loop - helix: Here, the preference is for the helix to go behind the sheet and slant to the left as shown below. To avoid visual clutter, only the last side chain (9) of the sheet is shown: pointing into the page as usual to provide a defined orientation. The loop (10-11) and the start of the helix (12) are shown. When the loop is 2 residues in length, this orientation is favoured over the one below by about 10-1. It's actually quite hard in FoldIt to achieve this geometry without the helix and sheet getting too close: furthermore the distinction between a loop and a helix isn't all that clear. Here, the preference is for the helix to go in front of the sheet and slant to the right as shown below. To avoid visual clutter, only the last side chain (9) of the sheet is shown: pointing into the page as usual to provide a defined orientation. The loop (10-12) and the start of the helix (12) are shown. This orientation is favoured over the one above in both natural and designed proteins but in neither case is the preference overwhelming: it's about 2-1 in artificial proteins and 1.5 to 1 in naturally occurring ones. Irrespective of the size of the loop (bit peculiar that): the orientation below is preferred. Again the helix is at an angle to the sheet, the first residue of which points into the plane, The paper doesn't mention these: not sure why they wouldn't be worthy of a mention.
|