[page 37↓]

Peptide transport by TAP

The TAP transporter is the main supplier of peptides binding to MHC-I molecules. As the TAP transport efficiency varies depending on the sequence of the transported peptide, the TAP preference influences the pool of peptides available for MHC-I binding. The importance of this influence in vivo is still subject to debate. Previous attempts to identify epitopes by an enhanced predicted TAP transport efficiency exhibited large allele specific differences. This has led to the conclusion that either MHC-I alleles are loaded by different degrees of TAP-independent transport (Brusic, et al., 1999), or that varying amounts of epitopes are transported as N-terminal prolonged precursors (Lauvau, et al., 1999). The second reasoning has been shown to be true for several epitopes (Goldberg, et al., 2002;Lauvau, et al., 1999) and receives further support by the identification of the protease ERA(A)P responsible for the N-terminal trimming of precursors in the ER (Saric, et al., 2002;Serwold, et al., 2002;York, et al., 2002). Motivated by this, a novel method to predict the effective transport of potential epitopes is developed in this chapter based on the predicted transportability of the epitopes themselves and their precursors.

This chapter consists of the following sections: First, an overview of existing methods to predict TAP affinities is given, which can be considered equivalent to predictions of TAP transport (section 3.1). The prediction quality of these methods is compared with a new SMM type scoring matrix established on a set of 9-mer peptides (section 3.2). In section 3.2 these predictions are generalized to be applicable to peptides of any length. This is the basis of a scoring algorithm to discriminate between presented epitopes and random sequences by their TAP transportability (section 3.4). Finally, it is shown in section 3.5 that epitope identification with combined predictions of TAP transportability and MHC-I affinity give better results than predictions using MHC-I affinity alone.

Most of the results presented in this chapter are taken from (Peters, et al., 2003).

3.1  Published prediction methods of in vitro TAP affinity

TAP transport rates can be determined experimentally using transport assays (Nijenhuis, et al., 1996;Wang, et al., 1998) where the transported peptides are trapped in the ER (e. g. by [page 38↓]glycolysation). However, these assays measure transport as well as further degradation in the ER, export from the ER and other side effects (Uebel and Tampe, 1999). Another experimental possibility is the use of in vitro affinity assays (Gubler, et al., 1998;Uebel, et al., 1997), in which the affinity has been shown to correspond closely to the transport rate of TAP (Gubler, et al., 1998). Affinity data is easier to measure and interpret, which allows to gather comparably large datasets, and is therefore the basis of this work. In the following, the correspondence of TAP transport and affinity is taken to be exact, which allows to equate predictions of TAP affinity with predictions of TAP transport.

To characterize the preference of TAP for 9-meric peptides, two scoring matrices were derived directly from experiments: The 'Ala-matrix' was constructed by using the peptide AAASAAAAY as a reference, and measuring IC50 values for the peptides possessing an exchanged amino acid at one of the 9 sequence positions (Daniel, et al., 1997;Gubler, et al., 1998). The 'Mix-matrix' was generated using libraries of 9-meric peptides X1X2...Y...X9, where Xi stands for a mixture of different amino acids and Y is a specific amino acid occupying a fixed sequence position (Uebel, et al., 1997). These libraries compete in binding with the totally randomized peptide library X1X2.....X9. Similar to the scoring matrices derived in chapter 1, the entries in these matrices can be summed up to predict the affinity of any 9-meric peptide.

In (Daniel, et al., 1998), an ANN was trained on TAP binding data from a set of peptides. The ANN predictions were compared to those made by the Ala-matrix, and were shown to be slightly but significantly better.

3.2  Comparison of affinity predictions for 9-mers

The TAP affinity predictions of the two experimentally derived matrices described above were compared on a set of 430 peptides with measured IC50 values (Daniel, et al., 1998). The resulting scatter plots are depicted in Figure 11, showing that the Mix-matrix makes significantly better predictions than the Ala-matrix.


[page 39↓]

Figure 11: Comparison of predicted and measured in vitro TAP affinity values of 9-mer peptides

The scatterplots depict the observed log(IC50) values of 430 9-meric peptides versus predicted log(IC50) values using the scoring matrix indicated at the bottom right of each panel. The solid curves represent linear regression lines.

With the measured IC50 values of the 430 peptides, it is possible to establish an SMM matrix as described in section 3.4. To be able to compare the prediction quality of this method with that of the two matrices derived directly from experiments, five different SMM matrices were established each trained on a subset of the 430 peptides. For each of these 5 subsets, an optimal λ was determined by cross-validation as shown for one subset in Figure 12. Each of the five SMM matrices was then used to predict the IC50 values of the peptides not included in its training data. The resulting scatter plot is also depicted in Figure 11, which shows that the SMM matrix makes significantly better predictions than the Ala- or Mix-matrix.


[page 40↓]

Figure 12: The Cross validated distance of measured and predicted TAP affinities is plotted as a function of λ

The distance between SMM matrix predictions and measured IC50 values in five-fold cross validation is plotted. The best predictions are made for λopt=5.

By averaging over the three scoring matrices, the 'consensus-matrix' (Table 4) is generated, which is expected to give better predictions than the individual matrices because their errors can partially compensate each other. A scatter plot for its predictions is also shown in Figure 11. As expected, the consensus matrix gives the best results although the SMM-matrix is only marginally worse. The ANN predictions from (Daniel, et al., 1998) were not available for a direct comparison. However, as they were only slightly better than those of the Ala matrix, which makes the worst predictions of the three individual matrices, it can be assumed that the consensus matrix predictions are at least as good as those made by the ANN.


[page 41↓]

Table 4: TAP consensus matrix

 

(N1) Pos 1

(N2) Pos 2

(N3) Pos 3

Pos 4

Pos 5

Pos 6

Pos 7

Pos 8

(C) Pos 9

A

-1.56

-0.25

-0.10

0.24

-0.10

0.17

0.27

-0.00

0.55

C

0.05

-0.01

-0.02

0.11

0.09

0.05

0.00

-0.13

0.00

D

1.37

1.42

1.83

-0.23

0.33

0.32

1.07

0.32

1.83

E

1.65

0.02

1.51

0.08

0.54

-0.13

0.64

0.44

1.58

F

1.03

-0.45

-1.05

-0.50

-0.26

0.08

-0.50

0.17

-2.52

G

0.28

1.14

1.70

0.45

0.66

0.12

1.41

-0.38

1.41

H

0.21

0.33

-0.23

-0.21

-0.11

-0.06

-0.19

0.39

0.55

I

-0.11

-0.49

-0.62

-0.09

-0.42

-0.75

-0.94

0.45

-0.52

K

-1.03

-0.41

0.09

-0.23

-0.08

-0.26

0.44

0.12

-0.45

L

-0.50

0.09

-0.11

0.11

-0.34

0.02

-0.73

0.01

-0.94

M

-0.38

-0.46

-0.58

-0.35

-0.26

0.30

-0.64

-0.11

-0.29

N

-1.43

0.69

1.01

0.38

0.49

-0.27

0.16

0.33

1.33

P

1.43

3.00

0.22

-0.04

-0.72

-0.13

-0.84

0.03

-0.09

Q

0.47

-0.97

0.39

0.15

0.15

-0.07

0.34

0.26

0.12

R

-1.34

-1.47

-0.42

-0.27

-0.32

-0.75

-0.09

-0.42

-1.47

S

-0.56

-0.34

0.11

0.27

0.45

0.31

0.87

-0.51

2.26

T

-0.12

-0.04

0.43

0.23

0.43

0.49

0.39

-0.46

0.72

V

-0.49

-0.50

-0.71

0.27

0.37

-0.02

-0.29

0.10

-0.30

W

0.54

-0.64

-1.65

-0.18

-0.78

0.31

-0.50

-0.63

-0.87

Y

0.50

-0.67

-1.80

-0.18

-0.13

0.28

-0.87

0.02

-2.91

3.3  Predictions of TAP affinities for longer peptides

It has been described in the literature that binding of peptides to TAP is mainly influenced by their C-terminal and three N-terminal residues (Daniel, et al., 1998;Uebel, et al., 1997;Uebel and Tampe, 1999;van Endert, et al., 1995). Motivated by this, a new scoring scheme to predict IC50 values of peptides with more than 9 residues is introduced, which neglects the influence of ‘inner’ residues: TAP affinities of peptides with arbitrary length are calculated by scoring only [page 42↓]the C-terminus and the three N-terminal residues using the four corresponding columns of the 9-mer matrix. Thus, for a peptide with the amino acid sequence N1, N2, N3, N4, …, C the TAP score t is given by

     

(8)

where mati,Xi denotes the score of residue X at sequence position i.

Figure 13: Comparison of predicted and measured in vitro TAP affinity values for peptides longer than 9 amino acids.

The scatterplots depict the observed log(IC50) values of 64 peptides versus theoretical log(IC50) values predicted using the scoring matrix indicated at the bottom right of each panel. The length distribution of the peptides was as follows: 36 10-mers, 18 11-mers, 6 12-mers, and one 13-, 15-, 16-, and 18-mer. The solid curves represent linear regression lines.


[page 43↓]

To test how well equation (8) predicts TAP affinities of peptides with more than 9 residues, it was applied to 64 peptides between 10 and 18 amino acids in length with measured affinities. As shown in Figure 13, the correlation between predicted and measured affinity values is lower than for the 9-mers, but still significant. The consensus matrix again provided higher correlation than all other matrices, so that it was used in all further applications.

3.4  Using TAP transport predictions for the identification of epitopes

To assess the selective role of TAP within the MHC-I presentation pathway, a test set of known naturally processed epitopes is needed. This is taken from the SYFPEITHI database (Rammensee, et al., 1999) and contains all known 9-meric epitopes that are presented naturally by any human MHC-I allele except those presented by HLA-A0201 (which are used later on), and for which the sequence of the source protein is available. MHC-I ligands, which are known to bind but which are not presented naturally are not included as well as epitopes derived from signal sequences. All other 9-mers contained in the protein sequences from which the epitopes originated are taken as random control peptides (=non-epitopes). In the following, this set of 203 epitopes and more than 60,000 random 9-mers is referred to as the HLA-X dataset.

To measure the prediction quality, again ROC curves and their integral (AUC) are used (section 2.5). First, the complete 9-mer consensus matrix is used to predict the TAP affinities of all 9-mers in the HLA-X dataset. These affinities are then used to separate epitopes from random 9-mers, resulting in the ROC curve plotted in Figure 14, curve (a), which corresponds to an AUC value of 0.702, indicating a relevant but not very good prediction.


[page 44↓]

Figure 14: ROC curves for the HLA-X dataset.

Curve (a) was constructed using the entire consensus-matrix on the HLA-X dataset yielding an AUC value of 0.702. For curve (b) scoring equation (10) was used with α=0.2 and L=10, giving AUC=0.791. The improvement is nearly completely in the high sensitivity region. The arrow indicates the point in curve (b) which corresponds to the sensitivity and specificity reached when choosing the cutoff=1, which is used later in the combined TAP and MHC-I predictions.

The same analysis was repeated but now including potential epitope precursors carrying N-terminal extensions. TAP affinities for N-terminal precursors of length 9, 10, ..., L were calculated for all epitopes and non-epitopes by means of equation (8). The TAP transport score of a potential 9-mer epitope is obtained by averaging over the TAP affinities of itself and its precursors up to a maximal length L:

     

(9)


[page 45↓]

Note that all precursors contribute to the transport score with identical C-termini, while the N-terminal contributions are varying. Increasing successively the maximal number L of allowed N-terminal extensions and using the corresponding TAP transport scores to discriminate between epitopes and non-epitopes, the AUC values depicted in Figure 15, curve (a) are obtained. For L=9 (no N-terminal extension), equation (9) is equivalent to equation (8) and the AUC value amounts to 0.700, which is only marginally lower than the value 0.702 obtained when using the complete consensus matrix. This finding further justifies the usage of equation (8).

Figure 15: Prediction quality for the HLA-X dataset as a function of the maximal precursor length

Plotted is the prediction quality measured by the AUC of the TAP transport score for different predictions: (a) equal weight for N- and C-terminus (equation 9) (b) C-terminus score only (equation 10, α=0) (c) optimal prediction with down-weighted N-terminus (equation 10, α=0.2)

The AUC values improve significantly with increasing maximal precursor length L. This was not expected for L greater than 18, as the TAP transport efficiency for peptides exceeding this length has been shown to drop of significantly (van Endert, et al., 1994). Evidently, increasing step by step the possible length L of epitope precursors, the statistical average across their N-terminal scores will converge against a stable limit value thus rendering the influence of N-terminal [page 46↓]scoring less and less important for the prediction of TAP affinities. Hence in the limit Là infinity, only the C-terminus will account for differences in the TAP scores of different potential epitopes. To see how close this limit is, the AUC values were calculated using the C-terminus for scoring only (Figure 15, curve b). Surprisingly, the AUC value of 0.782 is higher than all AUC values obtained before. This finding raises the question whether the rise in AUC values seen with increasing length L of precursors does really reflect the usage of longer precursors in antigen production, or whether the N-terminal scores are just adding noise to the prediction, which is smoothed out with increasing L. To check this, the TAP transport scores of the N-terminal residues were weighted by a factor α:

    

(10)

In Figure 15, curve (a) corresponds to α = 1 and curve (b) corresponds to α = 0. If the increase in AUC values obtained with precursors L>9 is only an artifact, one would expect the AUC for all values of L to grow monotonously when decreasing α from one to zero. If not, one would expect to find the optimal value of α somewhere between one and zero. The latter case is true: A maximum value of AUC was obtained for α=0.2 (curve(c) in Figure 15), which was significantly above the AUC value obtained when only scoring the C-terminus. Curve (b) in Figure 14 depicts the ROC obtained when choosing the optimal values L=10 (i.e. one N-terminal extension) and α=0.2. Hence, predicting TAP affinities of N-terminally extended epitope precursors by down-weighting their N-terminal scores in comparison to their C-terminal scores significantly improves the discrimination between epitopes and non-epitopes. Possible explanations for the 'down-weighting' of the N-terminus will be analyzed below.

To exclude that the improvement in predictions obtained when choosing α < 1 is a specific property of the HLA-X dataset, the same scoring procedure was applied to a completely independent set of mouse epitopes. This H2-X dataset was also extracted from the SYFPEITHI database following the same rules as those for the HLA-X dataset, but using mouse instead of human MHC-I alleles. Again it is tried to separate epitopes from random 9-mers using the predicted TAP transport efficiency (Figure 16), which is based on measurements of human TAP specificity. It has been shown that there are significant differences between the murine and [page 47↓]human TAP specificity (Momburg, et al., 1994), as human TAP translocates peptides with hydrophobic and basic C termini, whereas mouse TAP prefers only peptides with hydrophobic C termini. As expected, this results in generally lower AUC values than those for the HLA-X dataset. Nevertheless, qualitatively the three curves in Figure 16 (a)-(c) are related to each other in exactly the same way as those shown in Figure 15 for the HLA-X dataset: Using the scores for the N- and C-terminus with equal weights (α=1) for the prediction of TAP affinities results in a worse discrimination between epitopes and non-epitopes than neglecting the N-terminus completely (α=0). Again, a better prediction is achieved when the scores for the N-terminus are down-weighted with α=0.2.

Figure 16: Prediction quality for the H2-X dataset as a function of the maximal precursor lengths

Plotted is the prediction quality measured by the AUC of the TAP transport score given in equation (10) for different predictions: (a) equal weight for N- and C-terminus (α=1) (b) C-terminus score only (α=0) (c) better prediction with down-weighted N-terminus (α =0.2)


[page 48↓]

3.4.1  TAP transport predictions for individual MHC-I alleles

The calculations made in the previous section were repeated for individual MHC-I alleles that make up the HLA-X dataset to see how much the results vary. This analysis was restricted to those alleles for which at least 10 epitopes are present in the HLA-X dataset (Table 5). Epitopes presented by different allele subtypes were pooled in one set, for example the 'HLA-B27' set consists of epitopes listed in the SYFPEITHI database to be presented by HLA-B27 (unknown subtype) and the subtypes HLA-B2702, HLA-B2704 and HLA-B2705. While the binding preference of the allele subtypes can vary slightly, the datasets would otherwise be too small, especially as for many entries in the SYFPEITHI database the four digit code identifying the exact subtype is not given. The only exception is the HLA-A0201 set, for which only epitopes presented by this allele subtype are included.

First, it was studied how well the epitopes of each individual allele can be identified by TAP affinity scores computed without inclusion of possible precursors or down-weighting of the N-terminal residues (i.e. putting L=9 and α=1 in equation (10)). The resulting AUC values (Table 5) show huge variations from 0.39 to 0.89. The differences in prediction quality for the individual alleles correspond very well with those reported in (Brusic, et al., 1999;Daniel, et al., 1998), where the alleles HLA-B27, -A3 and -A24 were classified as efficient for TAP loading (high AUC) and the alleles HLA-B07, B08 and A0201 were classified as inefficient for TAP loading (low AUC).

Repeating the AUC calculations with the optimal parameters L=10 and α=0.2 obtained for the entire HLA-X dataset, the AUC values fall in a much narrower range between 0.71 and 0.88, i.e. a subdivision into TAP-efficient and TAP-inefficient alleles is no longer preserved. These results provide evidence that TAP plays an equally important role for peptide loading of all alleles considered. Intriguingly, some alleles such as HLA-B27 or HLA-A3 seem to be preferentially loaded with peptides directly imported from the cytosol whereas other alleles such as HLA-B35 or HLA-0201 are preferentially loaded with peptides entering the ER as N-terminally extended precursors where they are cut to final size.


[page 49↓]

Table 5: Individual alleles

 

# Epitopes

AUC L=9, α =1

AUC L=10, α =0.2

Optimal α for L=10

HLA-B35

10

0.39

0.80

0.0

HLA-B07

11

0.43

0.71

0.0

HLA-B08

10

0.69

0.80

0.0

HLA-B44

11

0.78

0.88

0.0

HLA-A24

37

0.81

0.87

1.0

HLA-A3

11

0.82

0.75

1.2

HLA-B27

20

0.89

0.77

4.0

HLA-A0201

87

0.65

0.70

0.4

Finally, the optimal value of α for each individual allele was calculated when setting L=10. The resulting values vary between 0 and 4, showing that the optimal value of α is extremely allele specific: The better the C-terminal residues required for effective TAP transport agree with those C-terminal residues enabling effective MHC-I binding to the given allele, the lower the weight that has to be put on the N-terminal residues. The optimal value of α=0.2 for the whole HLA-X dataset shows that, on the average, C-terminal amino acid motives required for effective TAP transport and MHC-I binding overlap stronger than the corresponding N-terminal motives. This is probably due to a stronger force for co-evolution on that motif, as the C-terminus undergoes no change from TAP transport to MHC-I binding, while the N-terminus can be trimmed.


[page 50↓]

3.4.2  Consequences of the uncertainty as to which N-terminally extended precursors are generated in vivo

Another explanation why better epitope predictions were achieved with α < 1 is the uncertainty as to which epitope precursors are actually transported in vivo to liberate the definitive epitope in the ER by N-terminal trimming. Equation (10) is based on the unrealistic assumption that up to a critical length L all N-terminally prolonged precursors of an epitope are present in comparable abundance. Given that several precursor are not generated in vivo, their score for the N-terminus will ‘dilute’ that of the existent precursors. From the statistical point of view, this would favor to put a higher weight on the score of the C-terminus, or equivalently, to down-weight scores of the N-terminal residues.

To estimate the implications of precursor uncertainty for the choice of α, simplified simulations of the MHC-I pathway were performed: Using the protein sequences from which the epitopes of the HLA-X dataset originate, a set of m fragments per sequence obeying a log-normal length distribution is generated, as was observed for the cleavage products of the proteasome (Kisselev, et al., 1999). These m fragments per sequence are considered to be the pool of potential epitope precursors generated by the proteasome that contain a C-terminal 9-mer which can bind to an MHC-I molecule. Which of these fragments becomes an epitope is decided by their affinity to TAP, which is calculated using equation (8). The fragment with the highest affinity per sequence is chosen, defining with its last 9 down-stream residues an epitope. The other m-1 fragments are discarded. It is then tried to identify these artificially generated 9-mer epitopes among all other 9-mers contained in the protein sequences by applying the TAP transport score (equation 10) at varying values of α.

The highest AUC values in all simulations were indeed obtained when choosing α<1. Figure 17 shows the AUC values for such a simulated dataset. In this case the highest AUC value was obtained for L=11 and α=0.6. Varying the width of the hypothetical length distribution in the simulations, the optimal α values were always between 0.6 - 0.9, i.e. larger than the value α=0.2 yielding the best prediction of epitopes on real experimental datasets but always smaller than 1.


[page 51↓]

Figure 17: Prediction quality on a simulated dataset

Plotted is the prediction quality measured by the AUC of the TAP transport score given in equation 10 for different predictions: (a) equal weight for N- and C-terminus (α=1) (b) C-terminus score only (α=0) (c) optimal prediction with down-weighted N-terminus (α=0.6)

There are three free parameters in the simulation: the number m of different fragments used to define a single epitope and the mean and standard deviation of the log-normal length distribution of peptides generated. The larger the value of m, the higher the selective power that TAP has in the pathway in comparison to the proteasome and the MHC-I molecules. By systematically increasing the value of m, it was found that with m=10 the AUC value on the basis of the TAP score for the C-terminus alone was close to those AUC values in Figure 15 and Figure 16 observed with real experimental data. The length dependence of the AUC values was in good concordance with that shown in Figure 15 and Figure 16 when choosing the mean of the log-normal length distribution in the range 9 – 11.


[page 52↓]

3.5  Combining TAP transport predictions with predictions of MHC-I affinity

It was tested whether the combination of predictions for two main steps of the presentation pathway, TAP transport and MHC-I binding, can improve the identification of epitopes. These calculations were performed on a set of 87 HLA-A0201 presented epitopes which had been omitted from the HLA-X dataset. For the prediction of peptide binding to HLA-A0201, the SMM scoring matrix developed in section 2.4 was used. On its own, this matrix already possesses a high capacity to identify the epitopes of the HLA-X dataset (AUC = 0.919, cf. Figure 18 curve (a)).

Figure 18: ROC curves for the combined TAP and MHC-I prediction on the HLA-A0201 dataset

The black curve (a, AUC=0.919) shows the (very high) level of the MHC-I prediction alone. The consistently better gray curve (b, AUC=0.932) is made by classifying all 9mers with a TAP transport score worse than 1 as not transported (α=0.2, L=10), and limiting the MHC-I prediction to the transported peptides.

To combine predictions of MHC-I binding with predictions of TAP transport, first the TAP transport efficiency is calculated for all 9-mers contained in the source sequences of the HLA-A0201 epitopes using equation (10) with the parameters L=10 and α=0.2. All 9-mers with TAP scores above the threshold value 1 are classified as not transportable and excluded from the set [page 53↓]of epitope candidates. This cutoff value was chosen by examining the ROC curve for the HLA-X dataset: Only 1.5% of epitopes but 32% of random 9-mers have a higher (=worse) TAP score (arrow in Figure 14). In the second step, the predicted MHC-I binding scores of the remaining peptides (having TAP scores < 1) were used to discriminate between epitopes and non-epitopes. Based on this two-step prediction protocol, the AUC value increases significantly to 0.932 (Figure 18, curve(b)). The improvement is largest in the high sensitivity region: Demanding 100% sensitivity, specificity is increased from 52% to 62% when using the combined prediction instead of MHC-I affinity prediction alone.

The same two-step prediction protocol was repeated for several mouse MHC-I alleles, using scoring matrices for the MHC-I affinity prediction that were measured by (Udaka, et al., 2000). Unfortunately, the number of epitopes available in the SYFPEITHI database per allele is small, ranging from 9 to 21 (Table 6). For three of the mouse alleles, the combined predictions gave better AUC values than MHC-I affinity predictions alone. For one allele (H2-Db), the combined prediction was worse. This shows that the combined MHC-I + TAP prediction using a human TAP matrix works for mouse epitopes, even though there are significant differences between the murine and human TAP specificity. This should improve significantly when using a scoring matrix based on experimental data for murine TAP.

Table 6: Combined TAP and MHC-I predictions

  

AUC values

Dataset

# Epitopes

MHC-I only

MHC-I + TAP

HLA-A201, 9-mers

87

0.919

0.932

H2-Kb, 8-mers

21

0.961

0.965

H2-Kb, 9-mers

9

0.855

0.879

H2-Db, 9-mers

20

0.971

0.949

H2-Ld, 9-mers

10

0.985

0.987


[page 54↓]

3.6  Confidence in the values of the free parameters α and L

There are two free parameters in the prediction of TAP transport scores: α and L. Throughout most of this chapter, the values α=0.2 and L=10 were used, which were determined to be optimal for the HLA-X dataset containing epitopes from all human MHC-I alleles except HLA-A0201. These parameters show large variations when calculated for the individual alleles that make up the HLA-X dataset (Table 5), as they are heavily influenced by each individual alleles binding preference. The parameter values for the entire HLA-X dataset, which average out the individual alleles binding preferences, should reflect the true effect of TAP more accurately. The optimal parameter values calculated for the H2-X dataset (αopt=0.02 and Lopt=11), or the combined MHC-I and TAP predictions for HLA-A0201 (αopt=0.6 and Lopt=18), which should also reflect the true effect of TAP, are considerably different from those for the HLA-X set. However, the decrease in prediction quality when using α=0.2 and L=10 instead of the optimal parameters for these datasets is quite small (ΔAUC < 0.006) compared to the loss in prediction quality of (ΔAUC ~ 0.100) when making predictions without down-weighting and neglecting precursors (i.e. α=1, L=9). Apparently both parameters are meaningful, but their optimal values cannot be fixed within a narrow range. The usage of α=0.2 and L=10 can therefore be recommended, even though, from a biological perspective, L=10 seems to be too small as longer precursors are known to be used in vivo.

3.7 Summary

In this chapter, a novel method to predict the TAP affinity of peptides of any length was introduced, which gave reasonably good predictions for peptides 9 - 18 residues long. This was used to assign an effective TAP transport score to a potential epitope, by averaging over the predicted TAP affinities of the epitope itself and its precursors. The ability of this score to discriminate between random 9-mers and presented epitopes improved when down-weighting the influence of the N-terminal residues. This was reasoned to be the consequence of the uncertainty which epitope precursors are present in vivo as well as possible co-evolution in the preference for the peptide C-terminus of TAP and the average MHC-I molecule.


[page 55↓]

Using the predicted TAP transport efficiency to identify naturally processed epitopes for individual MHC-I alleles showed that TAP does exert significant pressure on the epitope selection of all MHC-I alleles.

To combine TAP transport predictions with those of MHC-I affinity, all potential epitopes with a predicted TAP transport efficiency considered to be 'non-transportable' are eliminated. Using this as a filter prior to MHC-I affinity predictions improved the prediction quality considerably above that of the MHC-I affinity predictions alone.


[page 56↓]


© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 3.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML generated:
26.11.2004