Here you are: IMGT Web resources > IMGT Scientific chart > 1. Sequence description

IMGT reference sequences

Definition and characteristics

Definition

IMGT reference sequences are chosen on the basis of one or, whenever possible, several of the following criteria:

For the immunoglobulins (IG) and T cell receptors (TR), IMGT reference sequences are defined for the germline V-GENEs, D-GENEs, J-GENEs, and for the C-GENEs.

IMGT reference sequence for a given allele with partial L-V-GENE-UNIT (For V) or D-GENE-UNIT (for D) or J-GENE-UNIT (for J) or C-GENE-UNIT (for C) will be replaced by a complete sequence, when this available and fully annotated.

Characteristics

Characteristics of the IMGT reference sequences are according to the IMGT-ONTOLOGY concepts.

Presentation

The presentation of the IMGT reference sequences is of three kinds:

IMGT/LIGM-DB reference sequences

They correspond to IMGT/LIGM-DB accession numbers of which any part of the sequence has been defined as IMGT reference sequence for (a) given gene(s). The IMGT/LIGM-DB reference sequences can be accessed from:

IMGT/GENE-DB reference sequences

The IMGT/GENE-DB sequences correspond to the coding region sequences of the Functional or ORF genes (V-REGION, D-REGION, J-REGION, C-REGION), isolated from the IMGT/LIGM-DB sequences. By definition, there is one sequence for each Functional or ORF allele. If the C-REGION is encoded by several exons, the sequence is given by exon.

IMGT/GENE-DB reference sequences are provided in FASTA format:

In order to facilitate the search of expressed (spliced) sequences by BLAST on IMGT/LIGM-DB, and to increase interoperability with HGNC and external generalist expression databases, IMGT/GENE-DB reference sequences will also be provided, if there are several exons, with the exons being artificially joined.

Interoperability with genome databases:

IMGT reference directory sequences

The IMGT reference directory sequences correspond to sequence fragments according to IMGT Labels, isolated from the Functional and ORF IMGT/LIGM-DB reference sequences, in which gaps are inserted according to the IMGT unique numbering ('NUMEROTATION' concept of IMGT-ONTOLOGY).

By definition, the IMGT reference directory sets contain one sequence for each allele. Allele names of these sequences are shown in red in Alignments of alleles.

Sets of the IMGT reference directory are used in IMGT/V-QUEST and other IMGT tools. All IMGT reference directory sets can be downloaded in FASTA format.

FASTA header of IMGT reference directory sequences
A same IMGT coding label can be used for cDNA and genomic sequences. However in the case of splicing frame 1 (sf1) or splicing frame 2 (sf2) (Aide-mémoire, Splicing sites), the delimitations of the coding region in cDNA (based on codons) differ by one or two nucleotides from the 5' or 3' end of the corresponding exon in gDNA.
For that reason, the header of the downloadable IMGT reference directory sequences for coding regions, indicates, in column 9, the number of nucleotides added in 5' (for example, +1) and, in column 10, the number of nucleotides added or removed in 3' (for example,-1) compared to the corresponding genomic label extracted from IMGT/LIGM-DB.
The FASTA header contains 15 fields separated by '|':

Example:
>X03604|IGHG3*01|Homo sapiens|F|H1|g,901..950|51 nt|1|
+1
|
-1
| | |51+0=51| | |
gagctcaaaaccccacttggtgacacaactcacacatgcccacggtgccca

from the Homo sapiens IGHG3 alleles IMGT reference directory file