IMGT/GENE-DB

Introduction

IMGT/GENE-DB is part of IMGT®, the international ImMunoGeneTics information system®, the high-quality integrated knowledge resource specializing in the immunoglobulins (IG) or antibodies, T cell receptors (TR), and major histocompatibility (MH) of human and other vertebrate species, proteins of the immunoglobulin superfamily (IgSF) and MH superfamily (MhSF), related proteins of the immune systems (RPI) of vertebrates and invertebrates, therapeutic monoclonal antibodies (mAb), and fusion proteins for immune applications (FPIA), created in 1989 by Marie-Paule Lefranc (LIGM, Université Montpellier 2, CNRS).

IMGT/GENE-DB is the IMGT genome database for IG and TR genes from human, mouse and other vertebrates, on the Web since February 2003.
IMGT/GENE-DB provides a full characterization of the genes and of their alleles: IMGT gene name and definition, chromosomal localization, number of alleles, and for each allele, the IMGT allele functionality, and the IMGT reference sequences and other sequences from the literature. IMGT/GENE-DB allele reference sequences are available in FASTA format (nucleotide and amino acid sequences with IMGT gaps according to the IMGT unique numbering, or without gaps). IMGT/GENE-DB includes links to the IMGT Repertoire standardized resources (Chromosomal localization, Locus representation, Tables of alleles, Alignments of alleles, IMGT Protein displays, IMGT Colliers de Perles, etc.), to the IMGT/LIGM-DB and IMGT/3Dstructure-DB structures and IMGT/2Dstructure-DB IMGT databases.

IMGT/GENE-DB is the official repository of all of the IG and TR genes and alleles approved by the World Health Organization (WHO)/International Union of Immunological Societies (IUIS) Nomenclature Subcommittee for IG and TR (Lefranc 2007, 2008a). Reciprocal links exist between IMGT/GENE-DB and the Human Genome Nomenclature Committee (HGNC) database, NCBI Gene at the National Center for Biotechnology Information (NCBI).

IMGT/GENE-DB Query page

The IMGT/GENE-DB Query page shows, on the top right, the status of the database (current date, number of genes, number of alleles and number of species).

Search according to the concepts of IMGT-ONTOLOGY

Searches in IMGT/GENE-DB are performed according to the concepts of IDENTIFICATION, LOCALIZATION and CLASSIFICATION of IMGT-ONTOLOGY.

IDENTIFICATION

Species:: only species for which genes have been entered in IMGT/GENE-DB are available.
MolecularComponent:: IG or TR.
GeneType:: allows the selection on one of the gene types.
Functionality:: allows the selection on IG and TR functionality.
Clone name:: enter a clone name or the first letters of a clone name. Clone names are those of the "Reference sequences" and "Sequences from the literature" columns in Genes tables.

LOCALIZATION

Locus:: allows the selection on IG and TR loci (includes main loci and chromosomal orphon sets).
Main Locus:: allows the selection on a main IG or TR locus.
Chromosomal orphon set:: allows the selection on IG or TR genes which are outside the main loci or Chromosomal orphon set.

As Main Locus may contain RPI genes, a selection on 'Molecular component' 'IG' or 'TR' prior to the request will allow to only retrieve IG or TR genes.

CLASSIFICATION

IMGT group:

allows the selection on IMGT groups.

IMGT subgroup:

allows the selection on IMGT subgroups.

IMGT gene:

enter an IMGT gene name, for example IGHV1-2 (List of human genes according to the IMGT nomenclature).

Note that the search is case sensitive and that UPPERcase is the rule.

You can also enter only the first letters of the IMGT gene name: for example the selection of IGHV will list in the next page all genes which have an IMGT gene name beginning with IGHV.

You can consult the Correspondence between nomenclatures.

Selection of genes which have been found:

Allows the selection of genes which have been found rearranged and/or, transcribed, for at least one allele.

LOCALIZATION IN GENOME ASSEMBLIES

Species:: only species for which the gene localization in genomes assemblies are managed in IMGT/GENE-DB are available.
Locus:: only locus for which the gene localization in genomes assemblies are managed in IMGT/GENE-DB are available.
Assembly :: allows to select the assembly.
Assembly unit:: allows to select the assembly unit, for example "Primary Assembly".
Designation:: allows to select the Designation for example "Full chromosome 14" (for Homo sapiens).

IMGT/GENE-DB direct links

Provides a set of direct links to query IMGT/GENE-DB according to an IMGT gene name, an IMGT group or to get the links to IMGT/GENE-DB and generalist genomic databases.

RESULTS OF YOUR SEARCH

Depending on the number of resulting genes, you will see:

for 0 resulting genes: the message "There are no genes in IMGT/GENE-DB according to your criteria"
for one or more resulting gene(s): List of resulting genes

List of resulting genes

At the top of the page, the selected criteria are indicated with the number of resulting genes and the number of resulting alleles.

The list of resulting genes is a table with the following columns:

First column: select

Allows to select the genes and then Choose your display
In the example, Homo sapiens TRAV8-3 et TRAV8-4 have been selected.

IMGT gene names

Provides the gene names in the IMGT gene nomenclature (List of human genes).

Functionality

Provides the IMGT gene functionality according to the IMGT definition.

F: Functional
ORF: Open Reading Frame
P: Pseudogene

The Functionality may be shown between parentheses or between brackets: corresponding rules are available here

When more then one functionality is indicated for a gene (for example F, [F]), this means that the gene shows several alleles with distinct functionalities.

IMGT gene definition

Provides the gene definition according to the IMGT gene nomenclature.

Number of alleles

Provides, per gene, the number of alleles currently identified in IMGT.

Chromosomal localization

IMGT/LIGM-DB reference sequence for allele *01

Provides, for the allele *01, the IMGT/LIGM-DB accession number(s) of the corresponding reference sequence(s).

Molecular component

Provides the molecular component for the gene.

Choose your display

Three choices of display are provided:

"Complete IMGT/GENE-DB entries" is selected by default. It displays the detailed results for the selected genes (see IMGT/GENE-DB DETAILED RESULTS).

"IMGT/GENE-DB reference sequences in FASTA format" for the selected genes corresponds to :

F+ORF+all P nucleotide sequences for coding region(s) or exon(s)
F+ORF+in-frame P nucleotide sequences for coding region(s) or exon(s)
F+ORF+in-frame P nucleotide sequences with IMGT gaps for V and C genes for coding region(s) or exon(s)
F+ORF+in-frame P amino acid sequences for coding region(s) or exon(s)
F+ORF+in-frame P amino acid sequences with IMGT gaps for coding region(s) or exon(s)

The FASTA header of IMGT/GENE-DB reference sequences in FASTA format is standardized. See FASTA format of IMGT/GENE-DB reference sequences.

"IMGT label extraction from IMGT/LIGM-DB reference sequences" allows to extract, from the IMGT/LIGM-DB reference sequences, and for each allele of the selected gene(s), the sequences corresponding to one or several IMGT labels and/or artificially spliced exons.

The list of IMGT/LIGM-DB labels is available here.

Click first the Choose label(s) for extraction and/or artificially spliced exons button.
Select one or more labels in the list: the list of extractable labels is dynamically deduced from the list of labels described in the IMGT/LIGM-DB reference sequence for the allele *01 of the list of resulting genes above.
The extraction of artificially spliced and/or combined labels is proposed, provided that all labels are described in a same IMGT/LIGM-DB reference sequence accession number:
- For V genes (see prototype and graphical representation)
  - L-PART1+L-PART2 : corresponds to the artificially spliced sequence of L-PART1 and L-PART2
  - L-PART1+V-EXON : corresponds to the artificially spliced sequence of L-PART1 and V-EXON
  - L-PART1-to-V-EXON: corresponds to the sequence beginning with L-PART1 and ending with V-EXON, including V-INTRON
  - L-PART1-to-V-RS : corresponds to the sequence beginning with L-PART1 and ending with V-RS, including V-INTRON and V-EXON
- For D genes (see prototype and graphical representation)
  - 5'D-RS-to-3'D-RS: corresponds to the sequence beginning with 5'D-RS and ending with 3'D-RS, including D-REGION
- For J genes (see prototype and graphical representation)
  - J-RS-to-J-REGION: corresponds to the sequence beginning with J-RS and ending with J-REGION
- For C genes (see prototypes and graphical representations)
  - For IG genes:
    - artificially spliced exons with CHS, artificially spliced exons with membrane exon(s), artificially spliced exons without CHS and membrane exons:
      - CH1+H+CH2+CH3+CHS, CH1+H+CH2+CH3+M1+M2, CH1+H+CH2+CH3 (for example for human IGHG1 gene)
      - CH1+H+CH2+CH3+CHS, CH1+H+CH2+CH3+M, CH1+H+CH2+CH3 (for example for human IGHA1 genes)
      - CH1+H1+H2+CH2+CH3+CHS, CH1+H1+H2+CH2+CH3+M1+M2, CH1+H1+H2+CH2+CH3 (for example for human IGHD gene)
      - CH1+H1+H2+H3+H4+CH2+CH3+CHS, CH1+H1+H2+H3+H4+CH2+CH3+M1+M2, CH1+H1+H2+H3+H4+CH2+CH3 (for example for human IGHG3*01 allele)
      - CH1+H1+H2+H3+H4+H5+CH2+CH3+CHS, CH1+H1+H2+H3+H4+H5+CH2+CH3+M1+M2, CH1+H1+H2+H3+H4+H5+CH2+CH3 (for example for Gorilla gorilla gorilla (western lowland gorilla) IGHG3A*02 allele)
      - CH1+CH2+CH3+CH4+CHS, CH1+CH2+CH3+CH4+M1+M2, CH1+CH2+CH3+CH4 (for example for human IGHM or IGHE genes)
      Note that in field #8 (codon start) of the FASTA header:
      - '1' indicates that the nucleotide resulting from the splicing with the J gene (first 5' nucleotide of the first codon) is present (cDNA sequence). In that case, artificially spliced exons with CHS and artificially spliced exons with membrane exon(s) correspond to C-REGION.
      - '3' indicates that the nucleotide resulting from the splicing with the J gene is absent (gDNA sequence). In that case, artificially spliced exons with CHS and artificially spliced exons with membrane exon(s) correspond to C-REGION minus the first 5' nucleotide of the first codon resulting from the splicing with the J gene.
    - combined labels:
      - CH1-to-CH3: corresponds to the sequence beginning with CH1 and ending with CH3, including intermediate exons and introns
      - CH1-to-CH4: corresponds to the sequence beginning with CH1 and ending with CH4, including intermediate exons and introns
      - CH1-to-CHS: corresponds to the sequence beginning with CH1 and ending with CHS, including intermediate exons and introns
      - CH1-to-M: corresponds to the sequence beginning with CH1 and ending with M, including intermediate exons and introns
      - CH1-to-M2: corresponds to the sequence beginning with CH1 and ending with M2, including intermediate exons and introns
  - For TR genes:
    - artificially spliced exons
      - EX1+EX2+EX3
      - EX1+EX2R+EX2+EX3
      - EX1+EX2T+EX2R+EX2+EX3
      - EX1+EX2A+EX3
      - EX1+EX2A+EX2B+EX3
      - EX1+EX2A+EX2B+EX2C+EX3
      - EX1+EX2+EX3+EX4
      - EX1+EX2+EX3+EX4UTR
      Note that in field #8 (codon start) of the FASTA header:
      - '1' indicates that the nucleotide resulting from the splicing with the J gene (first 5' nucleotide of the first codon) is present (cDNA sequence). In that case, artificially spliced exons correspond to C-REGION.
      - '3' indicates that the nucleotide resulting from the splicing with the J gene is absent (gDNA sequence). In that case, artificially spliced exons correspond to C-REGION minus the first 5' nucleotide of the first codon resulting from the splicing with the J gene.
    - combined labels:
      - EX1-to-EX3: corresponds to the sequence beginning with EX1 and ending with EX3, including intermediate exons and introns
      - EX1-to-EX4: corresponds to the sequence beginning with EX1 and ending with EX4, including intermediate exons and introns
      - EX1-to-EX4UTR: corresponds to the sequence beginning with EX1 and ending with EX4UTR, including intermediate exons and introns (for TRA and TRD constant genes).
Note that these functionalitities are not yet available for conventional genes
Select then which sequence type you wish to download, Nucleotide sequences or Amino acid sequences.
For Nucleotide sequences only, you are allowed to extend the extraction in 5' and/or in 3'. Enter the number of nucleotides you wish to add.
If they exceed the IMGT/LIGM-DB sequence length, you will extract the total sequence. This option is not available for combined labels.
Click here for examples of results.

IMGT/GENE-DB DETAILED RESULTS

The IMGT/GENE-DB DETAILED RESULTS page provides the IMGT/GENE-DB entry (ies). The top of this page reminds you the gene(s) you have selected. You can click on each of them to view the corresponding IMGT/GENE-DB entry.

Content of an IMGT/GENE-DB entry :

IMGT gene name and definition

Provides the IMGT gene name (species and symbol in the IMGT gene nomenclature) and the IMGT definition (full name) of the gene.

Chromosomal localization

Provides the name of the locus (main locus or chromosomal orphon set), the chromosome number and the cytogenetic localization on the chromosome when known.

Localizations in genome assemblies

Provides the localizations of the gene and IMGT labels in the genome assemblies, if managed in IMGT/GENE-DB :

Name of the assembly
Assembly unit
Designation
Accession number in the assembly
IMGT allele name if identified and validated by IMGT biocurators
IMGT functionality of the allele if identified
IMGT labels
Positions of the gene and IMGT labels in the assembly, the link allows to retrieve the corresponding FASTA sequence.
Orientation of the gene and IMGT label in the assembly.

Number of alleles

Provides the number of alleles which have been currently identified in IMGT.

IMGT reference alleles

Provides a table in which are listed all identified alleles. For each allele are indicated:

its functionality
the names of the exons (for constant and conventional genes)
the R column (for variable, diversity and joining genes) (if defined): it indicates if the allele has been found (+) or not been found (-) rearranged (R).
the T and Pr columns (if defined): they indicate if the gene sequences have been found (+) or not been found (-) rearranged (R) transcribed (T), and/or translated into protein (Pr)
the IMGT/LIGM-DB reference sequence with :

the subspecies (if relevant and if defined)
the strain or breed or isolate (if relevant and if defined)
the clone name (if defined)
the accession number
the secondary accession numbers (if defined)
the molecule type (DNA or cDNA)
the specificity of cDNA sequences is indicated in the last column on the right when known.

Below the IMGT reference alleles table, a second table provides links to display the IMGT/GENE-DB reference sequences in FASTA format .

IMGT/GENE-DB reference sequences in FASTA format

The IMGT/GENE-DB reference sequences in FASTA format are provided according to the gene type.

The FASTA header is standardized according to FASTA format of IMGT/GENE-DB reference sequences.

For V genes

V-REGION

- F+ORF+all P: provides the nucleotide sequences of V-REGION for functional, ORF and all pseudogene alleles of the gene(s).

- F+ORF+in-frame P: provides the nucleotide and amino acid sequences of V-REGION for functional, ORF and in-frame pseudogene alleles of the gene(s). The nucleotide sequences and the amino acid sequences are provided with IMGT gaps according to the IMGT unique numbering (IMGT Scientific chart) .

L-PART1+V-EXON

- F+ORF+all P: provides the nucleotide sequences of the artificially spliced L-PART1 and V-EXON for functional, ORF and all pseudogene alleles of the gene(s).

- F+ORF+in-frame P: provides the amino acid sequences of the artificially spliced L-PART1 and V-EXON for functional, ORF and in-frame pseudogene alleles of the gene(s).
For D or J genes

- F+ORF+all P: provides the nucleotide sequences of D-REGION or J-REGION for functional, ORF and all pseudogene alleles of the D or J gene(s) respectively.

- F+ORF+in-frame P: provides the amino acid sequences of D-REGION or J-REGION for functional, ORF and in-frame pseudogene alleles of the D or J gene(s) respectively.
Note that the J-REGION in cDNA and gDNA differ by one nucleotide in 3'. In FASTA format, this nucleotide is restored if the reference sequence is from cDNA.
For C genes and conventional genes

Individual constant exon(s)

- F+ORF+in-frame P: provides the nucleotide sequences of individual constant exon(s) for functional, ORF and in-frame pseudogene alleles of the C gene(s).

- F+ORF+in-frame P with IMGT gaps: provides the nucleotide and amino acid sequences with gaps of individual constant exon(s) for functional, ORF and in-frame pseudogene alleles of the C gene(s). Gaps are according to the IMGT unique numbering (IMGT Scientific chart) .
Note that:
- For exons of C-GENE or GENE, if splicing frame 1 or 2, a nucleotide is added in 5' of these exons to obtain a complete first codon.
  In the FASTA header, in field 6, the added nucleotide is indicated followed by a comma before the start position.
  Note the number of added nucleotides in 5' is indicated in the FASTA header field 9 (see FASTA format of IMGT/GENE-DB reference sequences).
- For exons of C-GENE or GENE, if splicing frame 1 or 2, a nucleotide is deleted in 3' of these exons to obtain a complete last codon.
  In the FASTA header, in field 6, the end position is decreased by the number of deleted nucleotides in 3'.
  Note the number of removed nucleotides in 3' is indicated in the FASTA header field 10. (see FASTA format of IMGT/GENE-DB reference sequences)
IMGT gaps
Gaps of the IMGT/GENE-DB reference sequences with IMGT gaps are shown for the positions unoccupied based on the IMGT unique numbering 'for C-DOMAIN' (see 'Range of strand, turn and loop lengths in C-DOMAIN and C-LIKE-DOMAIN' https://www.imgt.org/IMGTScientificChart/Numbering/IMGTIGVCsuperfamily.html).
In particular, they include the following additional positions for C-DOMAIN:
1.8-1.1 (A-STRAND)
15.1-15.3 (AB-TURN)
45.1-45.7 (CD-STRAND)
84.1-84.7, 85.7-85.1 (DE-TURN)
96.1-96.2 (EF-TURN).

Artificially spliced exon(s)

- F+ORF+in-frame P: provides the nucleotide and amino acid sequences of the artificially spliced exons for functional, ORF and in-frame pseudogene alleles of the C gene(s).

Note that the sequences include one nucleotide from the upstream donor exon, added in 5' to obtain a complete first codon.

Other sequences from the literature (compiled in IMGT gene tables, IMGT Repertoire)

Provides for a given reference allele, the other sequences from the literature corresponding to that allele. For each allele of the gene is indicated the IMGT/LIGM-DB reference sequence with the clone name (if known), the accession number, the molecule type (DNA or cDNA).
The specificity of cDNA sequences is indicated in the last column on the right when known.

IMGT Repertoire links

Provides additional IMGT Web resources concerning the gene in relation with its locus and group available in IMGT Repertoire.

Annotated IMGT/LIGM-DB cDNA sequences

Provides:
- the number of annotated IMGT/LIGM-DB cDNA sequences for the selected gene.
- a link to a table of annotated IMGT/LIGM-DB cDNA sequences with the accession number, the IMGT allele name, the sequence length, the sequence functionality, the sequence definition and the specificity.

Annotated IMGT/LIGM-DB rearranged genomic DNA sequences

Provides:
- the number of annotated IMGT/LIGM-DB rearranged genomic DNA sequences for the selected gene.
- a link to a table of annotated IMGT/LIGM-DB rearranged genomic DNA sequences with the accession number, the IMGT allele name, the sequence length, the sequence functionality, the sequence definition and the specificity.

Annotated IMGT/3Dstructure-DB structures

Provides:
- the number of annotated IMGT/3Dstructure-DB structures for the selected gene.
- a link to a table of annotated IMGT/3Dstructure-DB structures with the PDB code, the IMGT allele name, the IMGT protein name, the IMGT receptor type, the IMGT receptor description, the species, the chain ID.

External links

Provides external links concerning the gene to other nomenclature, genome and sequence databases.

IMGT label extraction from IMGT/LIGM-DB reference sequences

"IMGT label extraction from IMGT/LIGM-DB reference sequences" is one of the three choices of Choose your display in RESULTS OF YOUR SEARCH.
It provides, for each allele of the selected gene(s), in FASTA format, the nucleotide sequences or the amino acid sequences corresponding to the selected label(s) extracted from the IMGT/LIGM-DB reference sequences.
Nucleotide sequences are provided for F+ORF+all P alleles.
Amino acid sequences are provided for F+ORF+in-frame P alleles.

Three example are displayed below:
- Example of extraction of the FR3-IMGT label and the L-PART1+V-EXON artificially spliced label in nucleotides
- Example of extraction of the L-PART1 label in nucleotides with extension of 5 nucleotides in 5' and 30 nucleotides in 3'
- Example of extraction of the L-PART1+V-EXON artificially spliced label in amino acids

Note that the FASTA header is standardized according to FASTA format of IMGT/GENE-DB reference sequences. In addition, in case of extension with nucleotides in 5' and/or in 3', the added nucleotides in 5' and in 3' are indicated in the field 6 of the FASTA header (see example)

Example of extraction of the FR3-IMGT label and the L-PART1+V-EXON artificially spliced label in nucleotides

Example of extraction of the L-PART1 label in nucleotides with extension of 5 nucleotides in 5' and 30 nucleotides in 3' (see Choose your display)

Note that the number of added nucleotides in 5' and in 3' are indicated in the field 6 of the FASTA header.

Example of extraction of the L-PART1+V-EXON artificially spliced label in amino acids

IMGT/GENE-DB LOCALIZATION IN GENOME ASSEMBLIES

The genomic localizations of IMGT genes are provided according to the selection : Species, Locus, Assembly, Assembly unit and Designation.

On the top of the page, the species and locus are indicated with the chromosomal localization and the orientation of the locus on the chromosome.
The number of localized genes in the assembly is then indicated with the corresponding number of labels between parenthesis.
A link allows to display the list of genes of the locus that are not localized in the selected assembly, if any.

The table comprises one line per localized gene including :

IMGT information regarding the gene in the locus:
- the IMGT gene name
- the IMGT gene order in the locus
- the orientation of the gene in the locus.
- the IMGT allele name and its functionality, if identified and validated by IMGT Biocurators, except for Mus musculus (mouse) locus.
Note that for Mus musculus (mouse) locus, the information provided is for IMGT allele *01.
- For the identified alleles, the IMGT/LIGM-DB accession numbers of the reference sequences.
- For the identified alleles, IMGT labels and positions in the reference sequences. Positions are provided for:
- L-V-GENE-UNIT and V-REGION for V genes
- D-GENE-UNIT and D-REGION for D genes
- J-GENE-UNIT and J-REGION for J genes
- C-GENE-UNIT and C exons, C domain and/or C-REGION for C genes
HGNC gene ID (for Mus musculus (mouse): MGI gene ID; for Danio rerio (zebrafish): ZNC gene ID).
NCBI information and IMGT label positions:
- NCBI gene ID
- NCBI accession number
- IMGT labels positions in NCBI accession number except for Mus musculus (mouse) locus.
Note that for Mus musculus (mouse) locus, positions are those provided by NCBI. For V genes, positions correspond to L-PART1+V-INTRON+V-EXON.

FASTA format of IMGT/GENE-DB reference sequences

The FASTA header of IMGT/GENE-DB reference sequences is standardized. It contains 15 fields separated by '|':

1. IMGT/LIGM-DB accession number(s)
2. IMGT gene and allele name
3. species
4. IMGT allele functionality
5. exon(s), region name(s), or extracted label(s)
6. start and end positions in the IMGT/LIGM-DB accession number(s)
7. number of nucleotides in the IMGT/LIGM-DB accession number(s)
8. codon start, or 'NR' (not relevant) for non coding labels
9. +n: number of nucleotides (nt) added in 5' compared to the corresponding label extracted from IMGT/LIGM-DB
10. +n or -n: number of nucleotides (nt) added or removed in 3' compared to the corresponding label extracted from IMGT/LIGM-DB
11. +n, -n, and/or nS: number of added, deleted, and/or substituted nucleotides to correct sequencing errors, or 'not corrected' if non corrected sequencing errors
12. number of amino acids (AA): this field indicates that the sequence is in amino acids
13. number of characters in the sequence: nt (or AA)+IMGT gaps=total
14. partial (if it is)
15. reverse complementary (if it is)

Note that the field 6. may be modified if:

a nucleotide has been added in IMGT/GENE-DB reference sequence in 5' of a label, to obtain a complete first codon (for example for C-GENE exons if splicing frame 1 or 2): the added nucleotide is indicated followed by a comma before the start position.
See for example the reference sequences of Homo sapiens IGHA1 gene .
Note the number of added nucleotides in 5' is indicated in field 9.
a nucleotide has been deleted in 3' of a label, to obtain a complete last codon (for example for C-GENE exons if splicing frame 1 or 2): the end position is decreased by the number of deleted nucleotides in 3'.
See for example the reference sequences of Homo sapiens IGHA1 gene .
Note the number of removed nucleotides in 3' is indicated in field 10.
a nucleotide has been added in 3' of a label, to obtain the complete genomic sequence (for example for J-REGION reference sequence from cDNA): the end position is followed by a comma and the added nucleotides in 3'.
See for example the reference sequences of Homo sapiens TRAJ47 gene .
Note the number of added nucleotides in 3' is indicated in field 10.

Four examples are displayed below:
- Nucleotide sequences with IMGT gaps
- Amino acid sequences with IMGT gaps
- Nucleotide sequences (without gaps)
- Amino acid sequences (without gaps)

Nucleotide sequences with IMGT gaps:
Amino acid sequences with IMGT gaps:
Nucleotide sequences:
Amino acid sequences:

IMGT/GENE-DB reference sequences and gene orientation

An IMGT/GENE-DB reference sequence for a given IG or TR gene is provided in the 5' > 3' DNA strand orientation corresponding to the 'sense', 'plus' or 'coding strand' of that gene (DNA strand orientation).

The orientation (direct or opposite) of an IG or TR gene in a given IMGT locus is given in Locus Gene order (Genomic orientation)
IMGT Repertoire (IG and TR) > 1. Locus and genes > 3. Locus descriptions > Locus gene order

Created: 31/01/2003
Last updated: 12/09/2019