This is an old revision of the document!

Protein grouping

Protein grouping is done from a parent context and consist of

creating new peptides (grouped peptide) ans matches from the union of all peptides referenced in child context (direct child if exist, otherwise more deeper childs).
defining protein group using new set of peptides and matches associated to parent context.

Algorithm

Protein grouping mechanism is detailled beneath the following image.

Step 1 - Peptide grouping

Peptides from different child context or identifications - attached directly or indirectly to a context - are grouped.
Peptides must have same sequence and same experimental mass to be grouped.
Peptide grouping results in new peptides attached to the parent context and having child peptides

A new peptide is construct as follow :

charge, sequence, ptm, missed cleavage and calculated mass are copied from the first child peptide found
experimental mass and retention time are set to childs average
score is set to the max of child scores
child list is set as peptides with same sequence, same mass are found
to define the matches list associated to new peptide, matches from all child peptides are grouped using matched protein. Created match score is set to the max of all child matches scores and start and end value are equal to child matches start and end.

Step 2 - Protein grouping

Once new peptides have been created and associated to parent context, same grouping as done by Mascot® and IRMa is done :

All proteins identified by the same set of peptides are grouped together as a protein group. Proteins sharing only a sub-set of peptides are distinguished in each group. A typical protein is one of the same-set proteins. The rules used to select this typical can be specified by user.

Protein grouping results in new groups of proteins and peptides, attached to the parent context. The protein group and proteins matching properties are set as follow :

Create a protein match for each protein of the group where the list and count of matching species is set.
Calculate score and coverage value using all matching species.

Beware of protein grouping order

You need to be carefull when grouping proteins within a tree of contexts. Let's take the following example:

Rootnode
  |_ Context1
     |_ F085255.dat
     |_ F085256.dat
     |_ F085257.dat
  |_ Context2
     |_ F085258.dat        
     |_ F085259.dat

It's possible:

case 1 - to group proteins at the Rootnode level, hEIDI will then group proteins from all the identification results, or
case 2 - to group proteins starting from the leaf contexts (Context1 and Context2), then ending with the Rootnode.

At present, when launching the protein group algorithm, you can tell hEIDI to filter some proteins and/or peptides. For example, if you decide to filter proteins with a number of peptides lower than 2, it is important to understand that doing this may give different results in cases 1 & 2.

Rootnode
  |_ Context1
     ProtA (pep1, pep2)
  |_ Context2
     ProtA (pep1, pep5)

In case 2, ProtA will be filtered at an early stage (when grouping proteins in Context1 and Context2), and will not appear in the final result.

But in case 1, when grouping proteins at the Rootnode level, ProtA will 'gain' one peptide more (ProtA will be identified by 3 peptides instead of 2). So, ProtA will not be filtered and will appear in the final result.

hEIDI

User Tools

Site Tools

Table of Contents

Protein grouping

Algorithm

Step 1 - Peptide grouping

Step 2 - Protein grouping

Beware of protein grouping order

Page Tools