SEQ model schema : seqdb.pdf
Proteine sequence.
Logical Column Name | Physical Column Name | Type | PK | Nullable | Remarks |
---|---|---|---|---|---|
id (PK) | id | BIGINT | PK | NOT NULL | Auto incremented Id. |
sequence | sequence | CLOB | Protein sequence (AA or DNA alphabet) normalized to Upper Case. | ||
hash | hash | VARCHAR(64) | SHA-256 hash of normalized (Upper Case) sequence (as ASCII / ISO 8859-1 byte array). |
id (PK) (id) Auto incremented Id
sequence (sequence) Protein sequence (AA or DNA alphabet) normalized to Upper Case. Unicity constraint enforced by unicity on “hash” (SHA-256) column.
hash (hash) SHA-256 hash of normalized (Upper Case) sequence (as ASCII / ISO 8859-1 byte array). Hash must be UNIQUE .
Logical Column Name | Physical Column Name | Type | PK | Nullable | Remarks |
---|---|---|---|---|---|
id (PK) | id | BIGINT | PK | NOT NULL | |
name | name | VARCHAR(255) | UNIQUE . | ||
release | release | VARCHAR(255) | Rule to parse SEDbInstance version string from source fileName. | ||
se_db_identifier | se_db_identifier | VARCHAR(255) | |||
repository_identifier | repository_identifier | VARCHAR(255) | |||
repo_id_from_se_id | repo_id_from_se_id | VARCHAR(255) |
id (PK) (id)
name (name) UNIQUE .
release (release) Rule to parse SEDbInstance version string from source fileName.
se_db_identifier (se_db_identifier)
repository_identifier (repository_identifier)
repo_id_from_se_id (repo_id_from_se_id)
Standard Repository.
Logical Column Name | Physical Column Name | Type | PK | Nullable | Remarks |
---|---|---|---|---|---|
id (PK) | id | BIGINT | PK | NOT NULL | |
name | name | VARCHAR(255) | UNIQUE . | ||
url | url | CLOB |
id (PK) (id)
name (name) UNIQUE .
url (url)
Logical Column Name | Physical Column Name | Type | PK | Nullable | Remarks |
---|---|---|---|---|---|
id (PK) | id | BIGINT | PK | NOT NULL | |
value | value | VARCHAR(255) | UNIQUE for given repository. | ||
repository_id (FK) | repository_id | BIGINT |
id (PK) (id)
value (value) UNIQUE for given repository.
repository_id (FK) (repository_id)
Search Engine Db.
Logical Column Name | Physical Column Name | Type | PK | Nullable | Remarks |
---|---|---|---|---|---|
id (PK) | id | BIGINT | PK | NOT NULL | |
name | name | VARCHAR(255) | UNIQUE . | ||
alphabet | alphabet | VARCHAR(3) | Alphabet used for sequences (AA, DNA. | ||
parsing_rule_id (FK) | parsing_rule_id | BIGINT | |||
repository_id (FK) | repository_id | BIGINT |
id (PK) (id)
name (name) UNIQUE .
alphabet (alphabet) Alphabet used for sequences (AA, DNA…)
parsing_rule_id (FK) (parsing_rule_id)
repository_id (FK) (repository_id)
Logical Column Name | Physical Column Name | Type | PK | Nullable | Remarks |
---|---|---|---|---|---|
id (PK) | id | BIGINT | PK | NOT NULL | |
value | value | VARCHAR(255) | UNIQUE for given se_db_instance. | ||
inferred | inferred | BOOLEAN | True if this se_db_identifier is inferred by sequence repository service (SE Db source cannot be loaded or protein description does not match). | ||
se_db_instance_id (FK) | se_db_instance_id | BIGINT | |||
bio_sequence_id (FK) | bio_sequence_id | BIGINT | |||
repository_identifier_id (FK) | repository_identifier_id | BIGINT |
id (PK) (id)
value (value) UNIQUE for given se_db_instance.
inferred (inferred) True if this se_db_identifier is inferred by sequence repository service (SE Db source cannot be loaded or protein description does not match).
se_db_instance_id (FK) (se_db_instance_id)
bio_sequence_id (FK) (bio_sequence_id)
repository_identifier_id (FK) (repository_identifier_id)
Unique version of a SE Db.
Logical Column Name | Physical Column Name | Type | PK | Nullable | Remarks |
---|---|---|---|---|---|
id (PK) | id | BIGINT | PK | NOT NULL | |
release | release | VARCHAR(50) | Version string, if date must be yyyyMMdd . | ||
source_path | source_path | CLOB | Can be the pathname of a FASTA file relative to Search Engine file system. | ||
source_last_modified_time | source_last_modified_time | TIMESTAMP | FASTA file last modified date or SEDbInstance creation timestamp. | ||
se_db_id (FK) | se_db_id | BIGINT |
id (PK) (id)
release (release) Version string, if date must be yyyyMMdd . UNIQUE for given seq_db_id .
source_path (source_path) Can be the pathname of a FASTA file relative to Search Engine file system.
source_last_modified_time (source_last_modified_time) FASTA file last modified date or SEDbInstance creation timestamp.
se_db_id (FK) (se_db_id)