Protein grouping
- Algorithm
  - Step 1 - Peptide grouping
  - Step 2 - Protein grouping
- Beware of protein grouping order

Protein grouping

Protein grouping is done from a parent context and consist of

creating new peptides (grouped peptide) ans matches from the union of all peptides referenced in child context (direct child if exist, otherwise more deeper childs).
defining protein group using new set of peptides and matches associated to parent context.

Algorithm

Since hEIDI 1.11.0, a new peptide/protein grouping algorithm has been implemented. This is part of a global idea which is to improve global performance in hEIDI.
Indeed, datasets become bigger and bigger (ex. VELOS), and loading all data in memory in one hEIDI session is no more possible (as we did before hEIDI 1.11.0).
The purpose is now to load the minimum information in hEIDI session, and to have algorithms that save results directly to MSIdb.
We have started to optimize the peptide/Protein grouping algorithm as it requires to load a complex object tree and so is very memory consuming. Other algorithms will be optimized progressively in further hEIDI versions.

What are the changes for the user:

Paradoxically, the grouping is little bit slower compared to previous hEIDI versions. But, initial loading and save operation will be faster
The MSIdb must be saved before launching grouping
The peptide/protein grouping result is automatically saved to MSIdb

Protein grouping mechanism is detailled beneath the following image.

Step 1 - Peptide grouping

grouping

Peptides from different child context or identifications - attached directly or indirectly to a context - are grouped.
Peptides must have same sequence and same calculated mass to be grouped.
Peptide grouping results in new peptides attached to the parent context and having child peptides

A new peptide is construct as follow (since heidi 1.13.0) :

peptide reference (sequence, ptm), missed cleavage and calculated mass are copied from the first child peptide found :
experimental mass, charge, delta mass, score, retention time and fragmentation count are copied from the best child :
child list is set as peptides with same sequence, same mass are found
to define the matches list associated to new peptide, matches from all child peptides are grouped using matched protein. Created match score is set to the max of all child matches scores and start and end value are equal to child matches start and end.

filtering

Different filters could be applied during grouping.

First protein filter is applied while creating the list of proteins to which match each peptide. (filter reverse protein for instance). This is done before new matches are created.
A second protein filter could be applied depending on its list of matches. This filter is applied after new matches are created. Typically this correspond to filter protein with less than x peptides…
The last filter allow user to filter new grouped peptides.

The filtered protein or species are not taken into account in the final grouping result, they are removed from result (unlike during protein filter ). An other difference with protein filter operation is that filtering is done on each proteins.

Step 2 - Protein grouping

Once new peptides have been created and associated to parent context, same grouping as done by Mascot® and IRMa is done.

But before executing the protein grouping the list of proteins to be considered is filtered using optional protein filter. This means that proteins are filtered individually and filter is not applied to protein group level. See protein group filters page.

Protein grouping consist in :

All proteins identified by the same set of peptides are grouped together as a protein group. Proteins sharing only a sub-set of peptides are distinguished in each group. A typical protein is one of the same-set proteins. The rules used to select this typical can be specified by user.

Protein grouping results in new groups of proteins and peptides, attached to the parent context. The protein group and proteins matching properties are set as follow :

Create a protein match for each protein of the group where the list and count of matching species is set.
Calculate score and coverage value using all matching species.

Beware of protein grouping order

You need to be carefull when grouping proteins within a tree of contexts. Let's take the following example:

Rootnode
  |_ Context1
     |_ F085255.dat
     |_ F085256.dat
     |_ F085257.dat
  |_ Context2
     |_ F085258.dat        
     |_ F085259.dat

It's possible:

case 1 - to group proteins at the Rootnode level, hEIDI will then group proteins from all the identification results, or
case 2 - to group proteins starting from the leaf contexts (Context1 and Context2), then ending with the Rootnode.

At present, when launching the protein group algorithm, you can tell hEIDI to filter some proteins and/or peptides. For example, if you decide to filter proteins with a number of peptides lower than 2, it is important to understand that doing this may give different results in cases 1 & 2.

Rootnode
  |_ Context1
     ProtA (pep1, pep2)
  |_ Context2
     ProtA (pep1, pep5)

In case 2, ProtA will be filtered at an early stage (when grouping proteins in Context1 and Context2), and will not appear in the final result.

But in case 1, when grouping proteins at the Rootnode level, ProtA will 'gain' one peptide more (ProtA will be identified by 3 peptides instead of 2). So, ProtA will not be filtered and will appear in the final result.

Table of Contents