More details about PRIDE XML format & runDescriptor.properties
- [FEATURE IN PROGRESS] Details about runDescriptor.properties
  - Default PRIDE XML file (i.e. obtained if runDescriptor.properties is empty)
- Controlled Vocabulary (CV)

More details about PRIDE XML format & runDescriptor.properties

PRIDE (PRoteomics IDEntifications database) is a public data repository for proteomics data.
PRIDE XML is the file format used for submission to the publicly available PRIDE database.
The PRIDE XML schema documentation is available here.

To provide meta info about your experiment for the PRIDE XML file, it's possible either to use PRIDE Converter (recommended) or to use the runDescriptor.properties file in hEIDI.

[FEATURE IN PROGRESS] Details about runDescriptor.properties

Warning

As this feature is in progress, we highky recommand to use the PRIDE Converter instead, in order to easily create a full detailled header for your PRIDE XML file.

In the runDescriptor.properties file (in <HEIDI_PROJECT_DIR>\heidi.project\), user provides information (as properties) to describe the experiment (title, description, contact info), the sample(s), the instrument, the protocol steps, etc.

User must use very specific terms for these properties, for ex. contact.name=Hippolyte Calys to give the name of the person to contact.
When running the PRIDE XML Export, these properties are then parsed and replaced by the appropriate XML tags (simple XML tags or Controlled Vocabulary params) to create the resulting PRIDE XML file.

Current limitations about using the runDescriptor.properties file

The list of available properties is not enough to create a full detailled header for the PRIDE XML file: information regarding protocol steps, sample description, additionnal information, etc. can not be provided as properties (specify them manually after PRIDE XML file creation)
Each property is mapped to a simple XML tag or a specific CV Param. When the property is mapped to a CV Param, the user will have to manually provide accession and cvLabel for the cvParam, after PRIDE XML creation.

First, see what is a CV Param and how to use it.
Then, you will need to browse ontologies to find the appropriate CV Params (Ontology Lookup Service).

Notes
- Missing properties will have a default value in order to create the required XML tags, but their values will have to be changed by user. These default values are surrounded by three exclamation points '!!!' in order to highlight them. They are not all required (see list bellow)
- The CV given below are examples, they are not exclusive and may change with PRIDE Format.

contact.name=NAME: contact person name (first and last names), or role name (required)
contact.institute=INSTITUTION: organisation with which the contact person or role is associated (required)
contact.info=INFO: information about contact (Phone number, email, postal address, etc.) (optional)
sample.name=SPLNAME: name of the sample used to generate the data set (required)
accession=ACC: the accession number assigned arbitrarily to a particular mzData instance (i.e. data) file, by the generator of that file. (required, default to context name)
instrument.model=MODEL: descriptive name of the instrument (make, model, significant customizations). (required)
instrument.source=SRCE: one of (required)
- APCI: Atmospheric Pressure Chemical Ionization (PSI:1000070)
- CI: Chemical Ionization (PSI:1000071)
- EI: Electronic Ionization (PSI:1000072)
- ESI: Electrospray Ionization (PSI:1000073)
- FAB: Fast Atom Bombardment Ionization (PSI:1000074)
- MALDI: Matrix-assisted Laser Desorption Ionization (PSI:1000075)
Mass analyser component list (e.g. quadrupole, collision cell); ordered so as to reflect the physical order of the described components in the mass spectrometer. For each analyser (number replaced in {0}), defines as many key-value pairs as necessary (replace pair number in {1}) (require at least one, even if no CV in generated XML…)
- analyzer.{0}.key.{1}= Keys examples. See Ontology here after for other examples Analyzer Descoriptio (PSI:1000010 to PSI:1000025) and Analyzer type (PSI:1000078 to PSI:1000084)
- analyzer.{0}.value.{1}= Value associated to specified key (AnalyzerType=PaulIonTrap; Resolution=2000; ResolutionMethod=FWHM; ResolutionType=Constant; …) if necessary
dectector.key : Key value = Terms that describe the detector (example: Detector Type (PSI:1000026); Detector Acquisition Mode(PSI:1000027), … to PSI:1000029 and all CV where is_a = PSI:1000026 to 29) ) (required even if no CV in generated XML …)
detector.value=associated value if necessary
processing.software.name : The official name for the software package used. (require)
processing.software.version : The version number of the software package. (require)
processing.method.{0}.key : Description of the default peak processing method. (Deisotoping PSI:1000033; Charge Deconvolution PSI:1000034; Peak Processing PSI:1000035 ) (optional)
processing.method.{0}.value : associated value (true/false, CentroidMassSpectrum…) (optional)

Default PRIDE XML file (i.e. obtained if runDescriptor.properties is empty)

<ExperimentCollection version="2.1">
  <Experiment>
    <Title>New Node</Title>
    <ShortLabel>test</ShortLabel>
    <Protocol>
      <ProtocolName>Protocol Name : To replace !</ProtocolName>
    </Protocol>
    <mzData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.05" xsi:noNamespaceSchemaLocation="http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.xsd" accessionNumber="Context_2">
	<description>
	  <admin>
	  <sampleName>!!! SAMPLE NAME !!!</sampleName>
          <contact>
	    <name>!!! USER NAME !!!</name>
            <institution>!!! USER INSTITUTION !!!</institution>
            <contactInfo>!!! USER INFOS / TEL ... !!!</contactInfo>
          </contact>
        </admin>
        <instrument>
	  <instrumentName>!!! INSTRUMENT MODEL  !!!</instrumentName>
          <source>
	    <cvParam accession="1" cvLabel="psi" name="type" value="!!! INSTRUMENT SOURCE!!!"/>
          </source>
          <analyzerList count="1">
 	    <analyzer>
            </analyzer>
          </analyzerList>
          <detector>
	  </detector>
        </instrument>
        <dataProcessing>
	  <software>
	    <name>!!! PROCESSING SOFTWARE NAME !!!</name>
            <version>!!! PROCESSING SOFTWARE NAME !!!</version>
          </software>
        </dataProcessing>
      </description>
      <spectrumList count="1623">
      </spectrumList>
    </mzData>
    <GelFreeIdentification>
      <Accession>KPYM_HUMAN</Accession>
      <Database>Sp_Trembl</Database>
    </GelFreeIdentification>
    ...
  </Experiment>
</ExperimentCollection>

Controlled Vocabulary (CV)

There are several CVs, coming from several ontologies, used in a PRIDE XML file.
You can find more details about CV on PRIDE website at help for using CV.

Here are some examples of Ontologies/CV:

Mass Spectroscopy CV (PSI-MS) [PSI] Browse ontology
PRIDE Controlled Vocabulary [PRIDE] Browse ontology
Protein Modifications (PSI-MOD) [MOD] Browse ontology
NEWT UniProt Taxonomy Database [NEWT] Browse ontology

Click here to have access to all ontologies.

Table of Contents

More details about PRIDE XML format & runDescriptor.properties

[FEATURE IN PROGRESS] Details about runDescriptor.properties

Default PRIDE XML file (i.e. obtained if runDescriptor.properties is empty)

Controlled Vocabulary (CV)