[page 91↓]

5  Summary of main results and conclusions

In the last chapters, the three main agents in the MHC-I pathway were examined with the goal to develop tools to predict their function in the antigen processing pathway. For peptide binding to MHC-I, a new prediction algorithm was developed. It combines a matrix-based method (SMM), which describes the contributions of individual residues to binding, with pair coefficients, which describe pair-wise interactions between positions in a peptide. This approach outperformed several previously published prediction methods, and for the first time quantified the impact of interactions in a peptide. The superiority of this approach is believed to be the consequence of three main novel features: (1) the use of a regularization parameter, which prevents the pair coefficients and the matrix entries from overfitting the data. (2) the pair coefficients are determined by systematic investigation of differences between the matrix predictions and the experimental values. As the matrix method is already highly accurate on its own, this is a better starting point than trying to determine both position contributions and position interactions all at once. (3) the interactions under investigations are limited to those with a sufficient amount of consistent training data.

The distribution of the pair coefficient values showed that interactions between adjacent peptide positions are somewhat stronger than those farther apart. However, this trend was seen to a much lesser extend than expected, signifying that interactions are not limited to neighboring amino acids in direct contact, but can also play a role over longer distances, probably through the conformation of the peptide back-bone. Compared to the SMM matrix entries, the pair-coefficients are rather small. This explains why methods completely ignoring interactions can still make good predictions.

Peptide affinities to TAP are considered to be closely related to their transport efficiencies. Therefore, the SMM matrix description developed to analyze peptide binding to MHC-I could also be applied to predict affinities of a set of 9-meric peptides to TAP. The SMM predictions were significantly better than those of two scoring matrices determined directly from experiments. Pair coefficients were not introduced here, to allow for the combination of all matrices into a single consensus matrix, which made the best overall predictions.


[page 92↓]

Using the experimental knowledge, that binding of a peptide to TAP involves mainly its C-terminus and three N-terminal residues, a 9-mer scoring matrix can be employed to predict the affinities of peptides of any length by taking only these residues into account. This was demonstrated to give good predictions of TAP affinities for peptides of size 10 to 18. Being able to predict TAP affinities of peptides longer than 9 amino acids (the typical epitope length) is important because it has become clear that several MHC-I epitopes are generated by N-terminal trimming of precursor peptides that are likely to be transported into the ER by TAP. As the true in vivo precursors of an epitope are not known, a generalized TAP score was established which averages across the scores of all precursors up to a certain length.

The highest prediction quality with this TAP score was achieved when the contribution of the N-terminal residues were down-weighted. It was reasoned on the basis of simulations and of results from scoring for individual MHC-I alleles, that this down-weighting partially reflects co-evolution of TAP and the average MHC-I allele as to the preference for certain C-terminal residues, as well as the uncertainty which epitope precursors are present in vivo. With this scoring method, the influence of TAP was found to be a consistent, strong pressure on the selection of MHC-I epitopes for all alleles. Using predicted TAP transport efficiencies as a filter prior to prediction of MHC-I binding affinities, it was possible to further improve the already very high classification accuracy achieved using MHC-I affinity predictions alone.

Such a two-step prediction protocol failed when predictions of C-terminal proteasomal cleavages were used as the filter, i.e. relying on MHC-I affinity predictions alone gave better results than combining them with proteasomal cleavage predictions. This disappointing result is thought to be caused by the lack of a sufficiently large set of quantitative and consistent experimental data on cleavage rates, which are more difficult to measure and interpret than the affinity assays used to characterize peptide binding to TAP and MHC-I. Therefore, in the last chapter a new protocol for the evaluation of proteasomal digests was developed, which was applied to a series of experiments. The first problem addressed in this protocol is the quantification of data from MS experiments. As the signal strength detected for a peptide depends not only on its amount but also on its chemical properties, additional information is needed to quantify a signal, which usually requires extra measurements in the form of calibration curves. To avoid these additional measurements, a novel method based on mass-balance equations was introduced which demands [page 93↓]that the total amount of peptides having one sequence position in common has to be conserved throughout the digest. This allowed for reasonable estimations of the peptide amounts from MS-signals in a digest.

Based on this quantified data, the first kinetic model of the 20S proteasome was developed which is capable of providing a satisfactory quantitative description of the whole time course of product formation measured in an in vitro digest. As known from conventional enzyme kinetics, the minimum ingredients to establish an enzyme-kinetic model are (1) the maximum activity characterizing the catalytic step of the enzyme under ideal working conditions (e.g. substrate saturation) and (2) the affinity characterizing the strength of interaction between enzyme and substrate. These two essential parameters have been incorporated into the proteasome model in terms of the parameters processing rate and peptide-bond cleavage probability. The crucial advantage of this model-based approach consists in the possibility of differentiating between non-specific changes of the procession rate and peptide-bond specific kinetic effects. Changes of the procession rate alone may lead to an increase or decrease in the amount of a specific peptide only if re-processing takes place - a typical situation under in vitro conditions. In vivo, re-processing of fragments is unlikely in view of the enormous amount of peptidase activity present in the cytosol. In this case, changes of the procession rate alone would result in a uniform increase or decrease of all fragments without affecting the relative proportions between them. Hence, a preponderance or repression of specific peptides (e.g. epitopes) over others can only be achieved by changes of the cleavage probability.

The analyzed proteasomal digests provide evidence that immuno-proteasomes have a consistently higher procession speed than the constitutive-proteasomes. The cleavage patterns for both types of proteasomes are rather similar: All cleavage sites are found to be used by both types of proteasome, and only a minority show significant changes in their probability of usage. However, the analysis of just two rather short model substrates does not allow for the generalization of these results. Also, many more substrates will have to be analyzed to have a sufficiently large training base to establish a new prediction algorithm of proteasomal cleavage.

Characterizing each element in the MHC-I pathway and combining predictions of their function is not the only possible approach towards a sequence based prediction of epitopes. It is also [page 94↓]possible to identify sequence motifs common to all epitopes presented by a specific MHC-I allele, as realized in the SYFPEITHI database (Rammensee, et al., 1999), and use this information for prediction. This approach does not differentiate between the influences of the proteasome, TAP or MHC-I on epitope selection, but has been shown to work well in practice. However, it has a principal drawback, as epitope sequences do not contain the full information used in the presentation pathway: The epitope may originate from a group of N-terminal prolonged precursors, generated by the proteasome, partially trimmed by cytosolic peptidases, transported by TAP into the ER and then cut to final size. These steps preceding binding to the MHC-I receptor will depend on sequence motifs in the flanking regions up- and downstream of the epitope, which are neglected when considering only the epitope sequences themselves. Hence, developing prediction algorithms for each individual step of the MHC-I presentation pathway and combining them should in principal be the superior approach. However, high quality experimental data for each step and advanced prediction techniques are needed to rival the prediction quality currently achieved by SYFPEITHI. Unfortunately, the predictive quality of the two approaches cannot be compared here, as there is no independent blind set available. SYFPEITHI is trained on the data used as test sets for the combined predictions developed in this work. For a neutral comparison, a significantly large set of newly identified naturally presented epitopes would be needed, or an older version of the SYFPEITHI prediction algorithm would have to be used and tested on more recently included epitopes. As a consequence, no conclusions about which method is currently better at identifying epitopes can be drawn here.

When applying an epitope prediction protocol that is based on algorithms for several individual steps of the MHC-I presentation pathway, it is of utmost importance that each prediction algorithm is trained on data containing only information on that specific step. For example, prediction methods that are supposed to predict MHC-I binding, but have been trained on data including epitope presentation, implicitly predict the effects of TAP and the proteasome. A combination of such an 'impure' MHC-I binding prediction with a prediction of TAP transport or proteasomal cleavage thus bears the risk of overestimating the role of TAP or the proteasome in the presentation pathway

The improvements achieved when including TAP transport of precursors into epitope predictions are in the high sensitivity regime of the ROC curve (cf. Figure 14). It is often argued that high [page 95↓]sensitivity of epitope predictions is of less practical relevance than having high specificity, i.e. to end up with a short list of high probability epitope candidates for a given protein sequence is all important. This view is wrong for two reasons: First, from the medical point of view, it can be equally interesting to identify all possible epitopes within a given protein sequence, requiring high sensitivity of the predictions. Secondly, when combining predictions for several steps of the MHC-I pathway whereby predictions of one step are used as a filter for the input to the next, it is very important to throw out as few true epitopes in each step as possible. Such a multi-step prediction protocol automatically increases specificity from one step to the next. [page 96↓]


© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 3.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML generated:
26.11.2004