User Tools

Site Tools


prolineconcepts:peptidematchesfilteringandvalidation

Peptide Matches Filtering

Peptide Matches identified in search result can be filtered using one or multiple predefined filters (describes here after). Only validated peptide matches will be considered for further steps.

Basic Score Filter

All PSMs which score is lower than a given threshold are invalidated.

Pretty Rank Filter

This filtering is performed after having temporarily joined target and decoy PSMs corresponding to the same query (only really needed for separated forward/reverse database searches). Then for each query, PSMs from target and decoy are sorted by their score. A rank (Mascot pretty rank) is computed for each PSM depending on their score position: PSM with almost equal score (difference < 0.1) are assigned the same rank. All PSMs with rank greater than specified one are invalidated.

Minimum Sequence length Filter

PSMs corresponding to short peptide sequences (length lower than the provided one) can be invalidated using this parameter.

Mascot eValue Filter

Allows to filter PSMs by using the Mascot expectation value (e-value) which reflects the difference between the PSM score and the Mascot identity threshold (p=0.05). PSMs having an e-value greater than the specified one are invalidated.

Mascot adjusted eValue Filter

Proline is able to compute an adjusted e-value. It first selects the lowest threshold between the identity and homology ones (p=0.05). Then it computes the e-value using this selected threshold. PSMs having an adjusted e-value greater than the specified one are invalidated.

Mascot p-value on Identity Filter

Given a specific p-value, the Mascot identity threshold is calculated for each query and all peptide matches associated to the query with a score lower than calculated identity threshold are invalidated.
When parsing Mascot result file, the number of PSM candidate for a spectra is saved and could be used to recalculate identity threshold for any p-value.

Mascot p-value on homology Filter

Given a specific p-value, the Mascot homology threshold is inferred for each query and all peptide matches associated to the query with a score lower than calculated homology threshold are invalidated.

Single PSM Per Query Filter

This filter will validate only one PSM per Query. To select a PSM, following rules will be applied:

For each query:

  • Select PSM with higher score.
  • If several PSM with same score:
    • Choose PSM which identify the protein which have the max nmber of valid PSM
    • If still equality
      • Choose the first PSM

:!: For testing purpose, it is possible to ask for this filter to be executed after Peptide Matches Validation (see below). In this case, the requested FDR in validation step will be modified by this filter. This is just to confirm the need or not of this filter and to validate the way we apply it!

Single PSM Per Rank Filter

This filter will validate only one PSM per Pretty Rank. If you choose this filter + a Pretty rank filter you should have the same behaviour than the “Single PSM Per Query Filter”.

In order to choose the PSM, following rules will be applied:

For Pretty Rank of each query:

  • If several PSM :
    • Choose PSM which identify the protein which have the max number of valid PSM
    • If equality
      • Choose the first PSM

:!: This filter is actually in test with “Single PSM Per Query Filter”. Finally a decision of with filter and how it is apply will be taken.

Peptide Matches Validation

Specify an expected FDR and tune a specified filter in order to obtain this FDR. See how FDR is calculated

Once previously described pre-filters have been applied, a validation algorithm can be run to control the FDR: given a criteria, the system will estimate the better threshold value in order to reach a specific FDR.

prolineconcepts/peptidematchesfilteringandvalidation.txt · Last modified: 2015/07/10 15:20 by 132.168.72.225