This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
prolineconcepts:rsvalidation [2013/11/21 14:32] 132.168.72.225 [Peptide Matches Validation] |
prolineconcepts:rsvalidation [2015/07/10 11:05] (current) 132.168.72.225 |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Search Result Validation ====== | + | ====== Validation Algorithm ====== |
- | Once a result filer have been imported and a search result created, the validation is performed in 4 mains steps : | + | Once a result file have been imported and a search result created, the validation is performed in 4 mains steps : |
- | - [[.:rsvalidation#Peptide Matches Filtering|Peptide Matches filtering]] and [[.:rsvalidation#Peptide Matches Validation|validation]] | + | - [[prolineconcepts:PeptideMatchesFilteringAndValidation|Peptide Matches filtering and Validation]] |
- [[.:ProteinInferer|Protein Inference]] (peptides and proteins grouping) | - [[.:ProteinInferer|Protein Inference]] (peptides and proteins grouping) | ||
- [[.:ProtScoring|Protein and Proteins Sets scoring]] | - [[.:ProtScoring|Protein and Proteins Sets scoring]] | ||
- | - [[.:rsvalidation#Protein Sets Filtering|Protein sets filtering]] and [[.:rsvalidation#Protein Sets Validation|validation]] | + | - [[prolineconcepts:ProteinSetsFilteringAndValidation|Protein Sets Filtering and Validation]] |
- | Finally, the [[.:rs_rsm|Identification Result]] issued from these steps is stored in the identification database. Different validation of a Search Result can be performed and a new Identification Summary of this Search Result is created for each validation. | + | Finally, the [[.:rsm|Identification Result]] issued from these steps is stored in the identification database. Different validation of a Search Result can be performed and a new Identification Summary of this Search Result is created for each validation. |
- | ===== Peptide Matches Filtering ===== | ||
- | Peptide Matches identified in search result can be filtered using one or multiple predefined filters (describes here after). Only validated peptide matches will be considered for further steps.\\ | ||
- | |||
- | |||
- | ==== Basic Score Filter ==== | ||
- | |||
- | All PSMs which score is lower than a given threshold are invalidated. | ||
- | |||
- | ==== Pretty Rank Filter ==== | ||
- | |||
- | This filtering is performed after having temporarily joined target and decoy PSMs corresponding to the same query (only really needed for separated forward/reverse database searches). Then for each query, PSMs from target and decoy are sorted by their score. A rank (Mascot pretty rank) is computed for each PSM depending on their score position: PSM with almost equal score (difference < 0.1) are assigned the same rank. | ||
- | All PSMs with rank greater than specified one are invalidated. | ||
- | |||
- | |||
- | ==== Minimum Sequence length Filter ==== | ||
- | |||
- | PSMs corresponding to short peptide sequences (length lower than the provided one) can be invalidated using this parameter. | ||
- | |||
- | ==== Mascot eValue Filter ==== | ||
- | |||
- | Allows to filter PSMs by using the Mascot expectation value (e-value) which reflects the difference between he PSM score and the Mascot identity threshold (p=0.05). | ||
- | PSMs having an e-value greater than the specified one are invalidated. | ||
- | |||
- | ==== Mascot adjusted eValue Filter ==== | ||
- | |||
- | Proline is able to compute an adjusted e-value. It first selects the lowest threshold between the identity and homology ones (p=0.05). Then it computes the e-value using this selected threshold. | ||
- | PSMs having an adjusted e-value greater than the specified one are invalidated. | ||
- | |||
- | ==== Mascot p-value on Identity Filter ==== | ||
- | |||
- | Given a specific p-value, the Mascot identity threshold is calculated for each query and all peptide matches associated to the query with a score lower than calculated identity threshold are invalidated.\\ | ||
- | When parsing Mascot result file, the number of PSM candidate for a spectra is saved and could be used to recalculate identity threshold for any p-value. | ||
- | |||
- | ==== Mascot p-value on homology Filter ==== | ||
- | |||
- | Given a specific p-value, the Mascot homology threshold is inferred for each query and all peptide matches associated to the query with a score lower than calculated homology threshold are invalidated. | ||
- | |||
- | ===== Peptide Matches Validation ===== | ||
- | |||
- | Specify an expected FDR and tune a specified filter in order to obtain this FDR. | ||
- | |||
- | Once previously described pre-filters have been applied, a validation algorithm can be run to control the FDR: given a criteria, the system will estimate the better threshold value in order to reach a specific FDR. | ||
- | ===== Protein Sets Filtering ===== | ||
- | |||
- | ==== Specific peptides Filter ==== | ||
- | |||
- | Invalid Protein Set that don't have at least x peptides identifying only that protein set. The specificity is considered at the DataSet level. | ||
- | |||
- | This filtering go through all Protein Sets from worth score to best score. For each, if the protein set is invalidated, associated peptides properties are updated before goinig to next protein set. Peptide property is the number of identified protein sets. | ||
- | |||
- | ===== Protein Sets Validation ===== | ||
- | |||
- | FIXME todo FIXME |