Table of Contents

SEQ-DB

SEQ model schema : seqdb.pdf

List of tables

bio_sequence

(Physical Name: bio_sequence)

Proteine sequence.

Logical Column Name Physical Column Name Type PK Nullable Remarks
id (PK) id BIGINT PK NOT NULL Auto incremented Id.
sequence sequence CLOB Protein sequence (AA or DNA alphabet) normalized to Upper Case.
hash hash VARCHAR(64) SHA-256 hash of normalized (Upper Case) sequence (as ASCII / ISO 8859-1 byte array).

id (PK) (id) Auto incremented Id

sequence (sequence) Protein sequence (AA or DNA alphabet) normalized to Upper Case. Unicity constraint enforced by unicity on “hash” (SHA-256) column.

hash (hash) SHA-256 hash of normalized (Upper Case) sequence (as ASCII / ISO 8859-1 byte array). Hash must be UNIQUE .

Referenced By

parsing_rule

(Physical Name: parsing_rule)

Logical Column Name Physical Column Name Type PK Nullable Remarks
id (PK) id BIGINT PK NOT NULL
name name VARCHAR(255) UNIQUE .
release release VARCHAR(255) Rule to parse SEDbInstance version string from source fileName.
se_db_identifier se_db_identifier VARCHAR(255)
repository_identifier repository_identifier VARCHAR(255)
repo_id_from_se_id repo_id_from_se_id VARCHAR(255)

id (PK) (id)

name (name) UNIQUE .

release (release) Rule to parse SEDbInstance version string from source fileName.

se_db_identifier (se_db_identifier)

repository_identifier (repository_identifier)

repo_id_from_se_id (repo_id_from_se_id)

Referenced By

repository

(Physical Name: repository)

Standard Repository.

Logical Column Name Physical Column Name Type PK Nullable Remarks
id (PK) id BIGINT PK NOT NULL
name name VARCHAR(255) UNIQUE .
url url CLOB

id (PK) (id)

name (name) UNIQUE .

url (url)

Referenced By

repository_identifier

(Physical Name: repository_identifier)

Logical Column Name Physical Column Name Type PK Nullable Remarks
id (PK) id BIGINT PK NOT NULL
value value VARCHAR(255) UNIQUE for given repository.
repository_id (FK) repository_id BIGINT

id (PK) (id)

value (value) UNIQUE for given repository.

repository_id (FK) (repository_id)

References

Referenced By

se_db

(Physical Name: se_db)

Search Engine Db.

Logical Column Name Physical Column Name Type PK Nullable Remarks
id (PK) id BIGINT PK NOT NULL
name name VARCHAR(255) UNIQUE .
alphabet alphabet VARCHAR(3) Alphabet used for sequences (AA, DNA.
parsing_rule_id (FK) parsing_rule_id BIGINT
repository_id (FK) repository_id BIGINT

id (PK) (id)

name (name) UNIQUE .

alphabet (alphabet) Alphabet used for sequences (AA, DNA…)

parsing_rule_id (FK) (parsing_rule_id)

repository_id (FK) (repository_id)

References

Referenced By

se_db_identifier

(Physical Name: se_db_identifier)

Logical Column Name Physical Column Name Type PK Nullable Remarks
id (PK) id BIGINT PK NOT NULL
value value VARCHAR(255) UNIQUE for given se_db_instance.
inferred inferred BOOLEAN True if this se_db_identifier is inferred by sequence repository service (SE Db source cannot be loaded or protein description does not match).
se_db_instance_id (FK) se_db_instance_id BIGINT
bio_sequence_id (FK) bio_sequence_id BIGINT
repository_identifier_id (FK) repository_identifier_id BIGINT

id (PK) (id)

value (value) UNIQUE for given se_db_instance.

inferred (inferred) True if this se_db_identifier is inferred by sequence repository service (SE Db source cannot be loaded or protein description does not match).

se_db_instance_id (FK) (se_db_instance_id)

bio_sequence_id (FK) (bio_sequence_id)

repository_identifier_id (FK) (repository_identifier_id)

References

se_db_instance

(Physical Name: se_db_instance)

Unique version of a SE Db.

Logical Column Name Physical Column Name Type PK Nullable Remarks
id (PK) id BIGINT PK NOT NULL
release release VARCHAR(50) Version string, if date must be yyyyMMdd .
source_path source_path CLOB Can be the pathname of a FASTA file relative to Search Engine file system.
source_last_modified_time source_last_modified_time TIMESTAMP FASTA file last modified date or SEDbInstance creation timestamp.
se_db_id (FK) se_db_id BIGINT

id (PK) (id)

release (release) Version string, if date must be yyyyMMdd . UNIQUE for given seq_db_id .

source_path (source_path) Can be the pathname of a FASTA file relative to Search Engine file system.

source_last_modified_time (source_last_modified_time) FASTA file last modified date or SEDbInstance creation timestamp.

se_db_id (FK) (se_db_id)

References

Referenced By