IMGT Education

IMGT gene name nomenclature for IG and TR of human and other vertebrates IMGT nomenclature for immunoglobulin (IG) and T cell receptor (TR) gene names of human and other jawed vertebrates is based on the 'CLASSIFICATION' concept of the IMGT-ONTOLOGY [1], and follows as closely as possible the Human Gene Mapping Nomenclature rules. Marie-Paule Lefranc, IMGT founder, has been applying this ontology as early as 1988 for the human IGL and IGH loci [2,3], and 1989 for all the genes of the human TRG locus [4-6]:

IMGT gene names and IMGT gene definitions for the human IG [6-10] and TR genes [11,12] have been approved by HGNC, the HUman Genome Organization (HUGO) Gene Nomenclature Committee in 1999.

Note that, in the HUGO symbols, slashes and parentheses are omitted, and capital letters replace the lower-case letters found in some provisional IMGT gene names. Otherwise the gene symbols and all the full names (including slashes and parentheses) are identical in IMGT and HUGO nomenclatures.

IG nomenclature schematic representation

General rules for Gene names - more details

Download as PDF

TR nomenclature schematic representation

Download as PDF

General rules

Gene names are in capital letters, no Greek letters, no commas, no dots (hyphens are accepted, as for the HLA and MHC genes). IG and TR genes and alleles are not italicized in publications.

Gene names are defined per species and the same gene symbol can be used across different species without any homology evidence between the two described genes. Names of the species are those defined in NCBI taxonomy database (https://www.ncbi.nlm.nih.gov/taxonomy) [13,14]

1. The three first letters of a gene name indicate the locus

IGH, IGK, IGL, IGI, TRA, TRB, TRG, TRD [1-5].

2. The fourth letter indicates the Gene type

V (for a variable gene)
D (for a diversity gene)
J (for a joining gene)
C (for a constant gene)

However, for the IGH locus, the constant genes are designated by the letter (and eventually number) corresponding to the encoded class (section 3.3.2) (IGHM, IGHD, IGHG3,...).

3. Following characters correspond to

the subgroup number or the clan name for V,
the number of the cluster or sets for D or J,
the number of the cluster or subclass number of C.

3.1 Subgroup number or clan name for V genes

3.1.1 V genes that can be assigned to a subgroup

Variable genes (whatever their functionality: functional (F), open reading frame (ORF) or pseudogene (P)) of a given species are assigned to the same subgroup if they share at least 75% nucleotide sequence identity in their V-REGION. The subgroup is defined by an Arabic number.

The subgroup number is determined by comparison with the subgroups of the most phylogenetically closely related species annotated by IMGT. However, a new subgroup specific to the species may be created.

The CDR-IMGT lengths, the sequences of the L-V-GENE-UNIT and the promoter components may be taken into account to determine the subgroup number in ambiguous cases.

Ex: Homo sapiens IGHV2 , Homo sapiens TRBV1

3.1.2 V genes that cannot be assigned to a subgroup

Pseudogenes that cannot be assigned to a subgroup due to low % of identity of the V-REGION (degenerated and/or truncated pseudogenes):

are assigned to a Clan for IG. The name of the clan is indicated with a Roman number between parentheses.
are identified by an uppercase latin letter for TR by comparison with the closest species, or by alphabetic order from 5' to 3' for distant species.

Ex: Homo sapiens IGHV(II)-1-1, Homo sapiens TRBVA

3.2 The number of the cluster or set for a D gene or a J gene

3.2.1 The number of the cluster or set for a D gene

The sets, whenever possible, are defined by comparison with the closest species sets, regarding the identity % of the nucleotide sequence of the D-GENE-UNIT for IGH locus, or their positions in the D-J-C clusters, for the other loci. Clusters are defined by an Arabic number from 5’ to 3’.

rabbit-DJCcluster — Locus representation: rabbit (*Oryctolagus cuniculus*) TRB ( D-J-C-CLUSTER).

3.2.2 The number of the cluster or set for a J gene

The sets, whenever possible, are defined by comparison with the closest species sets, regarding the identity % of the nucleotide sequence of the J-REGION and their positions in the D-J-C-CLUSTER, or J-C-CLUSTER.

3.3 The number of the cluster or class and subclass of a constant gene

3.3.1 Cluster for the IGL, TRG and TRB loci

IGL: cluster J-C (IGLJ1, IGLC1, IGLJ2, IGLC2….) when there is more than one C-GENE

IGL-bovine-DJCcluster — Locus representation: Bovine (*Bos taurus*) IGL ( J-C-CLUSTER).

TRG: cluster J-C (TRGJ1-1, TRGJ1-2, TRGC1, TRGJ2-1, TRGJ2-2, TRGC2…)

Locus representation: Dog (Canis lupus familiaris) TRG ( V-J-C-CLUSTER).
TRB: cluster D-J-C ( TRBD1,TRBJ1-1 to TRBJ1-n,TRBC1, TRBD2, TRBJ2-1 to TRBJ2-n, TRBC2…)

TRB-Dog-DJCcluster — Locus representation: dog (*Canis lupus familiaris*) TRB ( D-J-C-CLUSTER).

3.3.2 Class and subclass for the IGH locus

The constant genes are designated by the letter (and eventually number) corresponding to the encoded class and subclass (IGHM, IGHD, IGHG3,...).

Determination of the class and subclass of the IG constant gene, according to :

the % of identity of the sequences (all exons);
the prototypes (number of exons);
the comparison with the relative order of the closest species

If more than one gene shares the same subclass, letters in alphabetic order from 5’ to 3’ are appended to the name.

Ex: Gorilla gorilla gorilla IGHG3A, IGHG3B, IGHG3C

4. The last part of the gene name indicates the relative position of the gene within the locus and/or in the cluster

Mapped genes
Unmapped genes
Orphon genes

4.1 Mapped genes

4.1.1 Case of IG

For V genes :

Homo sapiens

For D genes :

section 3.2.1

IGH-Human-Dcluster — Locus representation: human (*Homo sapiens*) IGH ( D-CLUSTER).

For J genes :

section 3.2.2

IGH-Horse-Jset — Locus representation: horse (*Equus caballus*) IGH ( J-CLUSTER).

For C genes :

For IGL, the position/order is given from 5' to 3' (ex. Homo sapiens IGLC1, IGLC2).
For IGK, the position is not indicated, as it is a locus with only one constant gene (ex. Homo sapiens IGKC).
For IGH, see section 3.3.2.

4.1.2 Case of TR

Only applicable if several genes in a given subgroup and/or set/cluster

For V genes : a dash (“-”) plus the relative position of the gene within a given subgroup starting from the 5' end of the locus. Ex : Mus musculus TRAV3-1

For D genes: relative position of the gene within the D-J-C clusters starts from the 5’ end of the locus. Ex : Mus musculus TRBD1

For J genes :

For TRG and TRD: Relative position of the gene within the J-C or D-J-C clusters starts from the 5’ end of the locus.
For TRB: A dash (“-”) plus the relative position of the gene within the J cluster starting from the 5' end of the locus.
For TRA: Relative position of the gene within the D-J-C clusters starts from the 5’ end of the locus.

For C genes : relative position of the gene within the C group, or within the C clusters or J-C clusters or cassettes starts from the 5’ end of the locus. Ex : Mus musculus TRBC2

4.2 Unmapped genes

For unmapped genes that have not yet been localized, or are strongly suspected not to be in the right position, a temporary designation is indicated by the letter S for subgroup or sequential, respectively, followed by the incremented number of the last unmapped gene in the subgroup or cluster.

Remark: when an allele named with temporary designation is 100% identical to a newly identified mapped gene, its name is updated with the mapped one and the sequence becomes a sequence from the literature of the mapped gene.

4.3 Orphon genes

Orphon genes are IG and TR genes localized outside of the main locus. Following the same rules for subgroup/set/cluster (if known), "/OR" (for orphon) is added to the name, then the chromosome number (if known), a dash (“-”) and a gene number and/or letter.

Ex: Homo sapiens IGKV2/OR2-1, IGKV1/OR-1, TRBV22/OR9-2.

5. IMGT allele names

IMGT allele names comprise the IMGT gene name followed by an asterisk and a two-figure number.

The identification of IG and TR alleles is based on the sequence of the V-REGION, D-REGION, J-REGION and C-REGION. The first allele has the number *01 and is considered as reference sequence; other alleles are designated by increasing numbers (*02, *03, ...) based, if possible, on chronological order of their publication, and/or confirmation of data by different authors. (IMGT allele polymorphism)

Ex : Homo sapiens IGHV1-2*01

There is only one reference sequence per allele. The IMGT Reference sequence should be the first published sequence that meets the following criteria:

in germline configuration or, for C genes, undefined configuration
Complete sequence (X-GENE-UNIT)
Mapped
Annotated by IMGT

If the same allele is identified in other sequences, these are considered as literature sequences.

Particular cases

Case 1: Homo sapiens TRGV genes

Homo sapiens TRG locus contains 14 TRGV genes belonging to six subgroups. Gene names do not include the name of the subgroup. This nomenclature is only used for human phylogenetically closely related species annotations.

TRG-human-locus — Locus representation: human (*Homo sapiens*) TRG locus [16]

Case 2: Mus musculus IGHV genes

Regarding the Mus musculus IGHV genes, the last part of the gene name indicates the relative position of the gene within a given V subgroup starting from the 3’ end of the locus.

Case 3: TRAV genes which rearrange with TRAJ and TRDJ genes

If a TRAV gene is known to rearrange with TRAJ and TRDD-TRDJ genes, a slash (“/”) is added after the gene name, followed by the TRDV gene without the “TR” prefix.

Ex : Homo sapiens TRAV14/DV4*01

Case 4: Duplicated and triplicated genes

For individual duplicated and triplicated genes
For duplicated or triplicated clusters
For duplicated locus

Case 4A: Individual duplicated and triplicated genes

Duplicated/triplicated genes are genes which share 100% of identity in the V-REGION with an already characterized gene in a different position in the locus.

The gene name is the one assigned to the "initial" gene plus:

the letter "D" for duplicated gene
the letter "N" for triplicated gene

before the allele number.

Ex: Ovis aries TRAV9-1, TRAV9-1D, TRAV9-1N

Case 4B: For duplicated or triplicated clusters

Duplicated/triplicated clusters are a group of genes which share a high % of identity in the V-REGION with an already characterized group of genes in a different position in the locus. Additionally, they are characterized by:

one or more genes share 100% of identity in both clusters
both clusters show the same sequential order of the subgroups/clans
There is a equivalent of the distance between the neighboring genes in both clusters
The gene name follows the rule (case 4A).

Part of locus representation: sheep (*Ovis aries*) TRA/TRD locus, example for TRAV9-1D*01, TRAV13-2D*01

Case 4C: For duplicated locus

For duplicated locus (Ex: Homo sapiens IGK), the gene name is the one assigned to the "initial" gene plus the letter "D" for the genes of the duplicated locus, after the number of the subgroup and before the relative position and the dash if any.

Ex: Homo sapiens IGKV3D-25*01

Case 5: Gene insertion(s) and deletion(s)

Case 5A: Gene insertion(s)

Gene insertions and deletions in a locus are determined by comparison with an already annotated reference locus in a given species.

Common genes between the reference locus and the new one are identified: they are named according to the reference locus.

The inserted genes are named according to the classical procedure and to the following rules:

the inserted gene is named by its subgroup followed by the position of the closest conserved gene in 5’ for IG loci, and in 3’ for the TR loci, a sub position is created by a second addition of a “-” and a number: it will be incremented in case of future new discovered genes, no more than 2 dashes are accepted in IMGT gene names for V gene.
the latin alphabet letters, from 5' to 3' are added for J gene and C gene .

IGH-Dog-insertion — Part of locus representation: dog (*Canis lupus familiaris*) IGH locus, example for inserted genes IGHV3-21-1, IGHV3-5-1.

IGL-Gorille-insertionJC — Part of locus representation: western lowland gorilla (*Gorilla gorilla gorilla*) IGL locus, example for inserted genes IGLJ2A, IGLC2A.

Case 5B: Gene deletion(s)

For missing genes in the locus to be annotated by comparison with the reference locus, their name cannot be used. it is treated as a CNV by deletion.

Case 6: Mus musculus nomenclature

In order to establish the nomenclature of the new Mus musculus strains for instance, the existing rules and nomenclature were taken into account:

The similarities of the new strains' genes with the ones from IMGT annotated strains (C57BL/6J and 129/Sv, the IMGT reference repertoire) were sought.
The genes that get 100% similarity with an existing IMGT annotated gene (the IMGT reference repertoire) serve as “anchors”, allowing the analysis of their environment, neighboring genes.
In order to subsequently identify the new alleles of existing genes, the best results associated with a given new strains’ genes are analyzed in terms of their L-V-GENE-UNIT (L-PART1 to V-RS), but also their recombination signals and the leader of the V-REGIONs.
Following this analysis of the similarity between the new genes and IMGT annotated gene (the IMGT reference repertoire), blocks of similar genes are gradually built around the identified “anchors”.
Genes located between identified blocks are treated as insertions.

Previous nomenclature page is still available as archive (19/09/2024) here.

IMGT gene name nomenclature for IG and TR of human and other jawed vertebrates