User Tools

Site Tools


setupguide:seqrepoinstall

This is an old revision of the document!


Installing and configuring the Sequence Repository

Even if this is an optional module it is recommended to install it, mostly if you want to view the proteins sequences in the user interfaces!

It can be installed on the same machine running the Proline Server. However as this module will parse the mascot fasta files to extract sequence and description from it, it will be more efficient if installed on the computer executing your Mascot Server. In any case, you should also be able to access to the PostgreSQL server from the computer where Sequence Repository is installed.

Sequence Repository installation

This module comes with the Proline Server installation (using installer or manual installation)

Configuration

Configuration files are located under the “<seqrepo_folder>/config”.

Server and Datastore description

application.cong file define datastore and server description to access to the UDS database (for postgresql database). Properties specified here should be the same as the one you specify while configuring the Proline Server.

proline-config {
  driver-type = "postgresql" // valid values are: h2, postgresql 
  max-pool-connection=3 //Beta properties : specify maximum number of pool connected to DB Server. default to 3 
}

//User and Password to connect to databases server.
auth-config {
  user="<user-proline>"
  password="<passwoed-proline>"
}

//Databases server Host
host-config {
  host="<host>"
  port="5432"
}

uds-db { 
 connection-properties {
    dbName = "uds_db"   }
}

h2-config {
  script-directory = "/h2"
  connection-properties {
    connectionMode = "FILE"
    driver = "org.h2.Driver"
  }
}

postgresql-config {
  script-directory = "/postgresql"
  connection-properties {
    connectionMode = "HOST"
    driver = "org.postgresql.Driver"
  }
}

note:

  • If you didn't change the default naming scheme of databases the 'uds_db' this value should be kept for dbName
  • <user-proline> and <password-proline> are the same as specified in application.conf for Proline Server

Protein description parsing rule

As this module is used to extract Protein sequence and description from a fasta file for a specific protein accession, it is necessary to configure the rule used to parse the protein ACC, from fasta description line. This is similar to the rules specified in Mascot Server. To do this, parsing-rules.conf file should be edited. In this file it is necessary to escape (this means prefix with '\') some characters: '\' , ':' and '='

//Specify path to fasta files for SeqRepository daemon. Multiple path separated by ',' between []
//On linux system : local-fasta-directories =["/local/mascot/sequence"] 
local-fasta-directories =["D:\\mascot\\sequence"] 

// Rules used for parsing fasta entries. Multiple rules could be specified.
// name : identifying rule definition
// fasta-name : FASTA file name must match specified Java Regex CASE_INSENSITIVE. multiple Regex separated by ',' between [] could be specified
// fasta-version : Java Regex with capturing group for fasta release version extraction (CASE_INSENSITIVE)
// protein-accession : Java Regex with capturing group for protein accession extraction

parsing-rules = [{
   name="label1",
   fasta-name=["uniprot"],
   fasta-version="[.]*_([^_]*).fasta",
   protein-accession =">\\w{2}\\|([^\\|]+)\\|"    
},
{
  name="label2",
   fasta-name=["myDB"],
   fasta-version="[.]*_([^_]*).fasta",
   protein-accession =">\\w{2}\\|[^\\|]*\\|(\\S+)"    
}
]


//Default Java Regex with capturing group for protein accession if fasta file name doesn't match parsing_rules RegEx
// >(\\S+) :  String after '>' and before first space
default-protein-accession =">(\\S+)"

For example label1 rule will capture P07259 from line sp|P07259|PYR1_YEAST … ⇒ a '>' then 2 char then '|' then (capture until '|') and '|'… This rule will be applied for all fasta file prefixed with uniprot

Testing rules

In order to verify the specified configuration, once previous files have been configured and saved, run the following tool under sequence repository installation directory (changing <version> to specific version ) On Windows system

java -cp "config;PM-SequenceRepository-<version>.jar;lib\*" fr.proline.module.seq.service.ListMatchingRules

On Linux system

java -cp "config:PM-SequenceRepository--<version>.jar:lib/*" fr.proline.module.seq.service.ListMatchingRules
setupguide/seqrepoinstall.1454577154.txt.gz · Last modified: 2016/02/04 10:12 by 132.168.72.225