IMGT®, the international ImMunoGeneTics information system®

logo IMGT

Frequently asked questions

Why are immunoglobulin and T cell receptor genes described as "genes", instead of "gene segments" or "segments"?
Immunoglobulin and T cell receptor genes were accepted as "genes" by the Human Organisation (HUGO) Nomenclature Committee (HGNC) in 1999. It was the only way to have the IG and TR genes entered into the general genome databases (LocusLink, GDB, GeneCards, Entrez Gene) and to define and characterize alleles in a standardized way. This has been accepted as different definitions of a gene coexist in biology. Moreover, each IG or TR gene has its own promotor and can be transcribed as an independent transcript.
Why are abbreviations of IG and TR used, instead of Ig and TcR?
The official nomenclature for the immunoglobulin and T cell receptor genes (approved by HGNC in 1999) starts with the 2 letters "IG" and "TR". IG and TR are therefore used when referring to the genes, loci and chains, whereas Ig and TcR are used for a more general description. The abbreviation TCR should be avoided as, being in capital letters, it creates a confusion with gene names.
Why are immunoglobulin and T cell receptors in "subgroups", rather than in "families"?
A subgroup is part of a group. Both a subgroup and a group are well defined entities, whereas a family is not.
Why is there sometimes capitalization in midsentence?
Capital letters, if in midsentence, indicate sections of the IMGT Repertoire (for example: Alignments of alleles, Tables of alleles, etc.).
Is it judicious to use a L-PART1 oligonucleotide to amplify V-REGIONs?
L-PART1 corresponds to the exon that encodes the first part (the longest one) of the leader. An oligonucleotide in L-PART1 will work well on cDNAs, for the amplification of V-REGIONs. On genomic DNAs, such an oligonucleotide will also amplify the intron between L-PART1 and L-PART2. See Variable region representation in gDNA and cDNA.
Where can I find known human IG allotype sequences?
For Gm allotype sequences, the IMGT/LIGM-DB accession numbers of the sequences that correspond to the Gm allotypes are indicated in "Gene tables: Human IGHC" in IMGT Repertoire The corresponding IGHG allele sequences in FASTA format (per exon) are available from IMGT/GENE-DB.
For Km allotypes, the correspondence between Km alleles and IGKC allele names is available in "Allotypes: Human IGKC" in IMGT Repertoire. The corresponding IGKC allele sequences in FASTA format are available from IMGT/GENE-DB.
What differences between allotypes and alleles?
The definition of "allotypes" requires that antibody reagents are available to determine the allotypes serologically. If the determination is only done at the sequence level, the polymorphisms have to be described as "alleles". This does not hinder to establish a correspondence with allotypes if the correspondence allele-allotype has been experimentally proven, or if the individual sequence is identical to a sequence for which it has been demonstrated.
Why are there differences in the V and J assignments of rearranged human IG and TR sequences, between IMGT/LIGM-DB and the generalist databases GenBank/EMBL/DDBJ, although the flat file accesssion numbers are identical?
IMGT/LIGM-DB provides annotated flat files and uses the official nomenclature of the human immunoglobulin (IG) and T cell receptor (TR) genes, defined by IMGT and approved by the HUGO Nomenclature committee (HGNC) in 1999. The official nomenclature is used by GeneCards, LocusLink and Entrez Gene at NCBI: 28644. The IMGT/V-QUEST tool analyses rearranged IG or TR sequences, provides the correct gene and allele assignment of the closest Germline genes. Citing IMGT/V-QUEST: PMID:15215425.
See Correspondence between nomenclatures and IMGT Index: Nomenclature for more information on nomenclatures.
The reference books are the following:
Lefranc, M.-P. and Lefranc, G., The Immunoglobulin FactsBook, Academic Press, 458 pages (2001) ISBN:012441351X
Lefranc, M.-P. and Lefranc, G., The T cell receptor FactsBook, Academic Press, 398 pages (2001) ISBN:0124413528.
Is it possible to observe sequences with additional amino acids in the CDR-IMGT?
Yes, sequences may have additional amino acids in the CDR-IMGT. For example, the IGIV sequences of Oncorhynchus mykiss (IMGT Repertoire).
Is it possible to selectively obtain, from IMGT/LIGM-DB, the sequences of antibodies that are known to bind to antibodies?
You can query IMGT/LIGM-DB on "Specificities" (module: "Taxonomy,..."). For example: anti-idiotype, anti-IgG rhumatoid factor, anti-Fc...
If several lines have in common a given specificity, you need to make a query for each line, individually . For example, for "anti-idiotype":
anti-idiotype
anti-idiotype (A48: levan-specific BALB/c myeloma protein ABPC48)
anti-idiotype (Ab3) (anti-Neisseria meningitidis polysaccharide C)
anti-idiotype (anti-sigma receptor)
anti-idiotype (anti-tumor effect)
anti-idiotype > cocaine
anti-idiotype > cyclosporine (Cs)
anti-idiotype HLA class II
anti-idiotype [human]
anti-idiotype, anti-cancer [human]
anti-idiotype, anti-epidermal growth factor receptor
Specificities available in IMGT/LIGM-DB are listed at here.
Where, on the IMGT site, can we find information about the antibody diversity calculation?
Click on the different links in "IG and TR number of genes: Human". The 3 first links lead to sections in the same page:
Total number of IG and TR genes
Number of functional IG and TR genes
Number of genes in the IMGT genome analysis tools
The other links lead to other pages:
Potential germline repertoires
Questions and answers (IMGT Education): Nomenclature and overview of the human immunoglobulin genes
Questions and answers (IMGT Education): Nomenclature and overview of the human T cell receptor genes
There is also some information in French at IMGT Education > Questions and Answers > Gènes et locus
Is it possible to get restriction maps for the IG and TR loci?
The restriction maps of the IG and TR loci are not stored on the IMGT site.
One way to proceed is to go to IMGT/LocusView. This will allow to identify the clones that contain the genes you are looking for (if you are only interested by a given gene, you can query IMGT/GeneSearch instead of IMGT/LocusView). You can then retrieve the sequences containing the genes of interest (clicking on the clones - in blue - gives access to the entries in IMGT/LIGM-DB, and therefore to the sequences in FASTA format).
You can then analyse the sequences with a tool such as RESTRICT (EMBOSS).
Which gene to choose, in IMGT/V-QUEST results, when two genes give an identical score?
A look on the IMGT/V-QUEST alignment is useful to check where the differences are between the input sequence and the two germline genes, and eventually to decide which gene to choose.
For the human IGKV genes, if two germline genes, one from the proximal cluster and one from the distal cluster, give an identical score with the input sequence, it is preferable to select the gene of the proximal cluster (as genes of the distal cluster are rarely used).
How to analyse comparison between IG V sequences from a species with human ones?
You can compare:
  1. the IG V gene sequences from a species with the human ones using IMGT/V-QUEST.
  2. the IG V amino acid alignment with the human IGHV, IGKV and IGLV IMGT Protein display (germline sequences) to identify if unusual amino acids found in the analysed sequences are found in human V genes.
  3. the IG V gene sequences from a species with the human IG productively rearranged V sequences (Pommié C. et al 2004 PMID:14872534).
    • How many of the 19 conserved positions (chemical characteristics in Table 2C and Table 3A of Pommié et al 2004) between human IGHV and IGKV/IGLV are conserved in the analysed sequences?
    • How many of the 41 conserved positions between human IGKV and IGLV (chemical characteristics in Table 2C and Table 3B- yellow+pink+light green) are conserved in the analysed sequences?
    • Are the specific positions in human IGKV or IGLV conserved in the analysed sequences? For human IGKV, for instance, the four specific positions are 7: hydroxyl, 24: basic, 86: acidic, and 87: F (Pommié C. et al 2004 PMID:14872534).
  4. the IG V amino acid alignment with the human IGHV, IGKV and IGLV IMGT Protein display (germline sequences) to identify if unusual amino acids found in the analysed sequences are found in human V genes.
How to determine the CDR3-IMGT length of a germline V gene?
The length of the CDR3-IMGT, is expressed in number of amino acids or number of complete codons, following the 2nd-CYS at position 104.
Between the end of the last complete codon in 3' and the V-HEPTAMER, there are frequently one or two nucleotides. This(ese) nucleotide(s) belong(s) to the V-REGION and are taken into account for a nucleotide comparison, but they are not considered in the CDR3-IMGT length.
How to represent CDR-IMGT lengths?
CRD-IMGT lengths should be represented as described in Lefranc et al 2003 PMID:12477501, that is as numbers separated by dots, between brackets.
For example, [6.3.7] indicates that the CDR1-IMGT has a length of 6 amino acids, the CDR2-IMGT has a length of 3 amino acids and the CDR3-IMGT has a length of 7 amino acids. The same type of representation is used for germline and rearranged genes, and the information (subgroup or gene name, and configuration) should be provided with the CDR-IMGT lengths.
What are the recommendations for correctly representing V-REGION IMGT Protein displays?
Check the following items:
  • the IMGT numbering is correct (Lefranc et al 2003 PMID:12477501)
  • a space is added between the FR-IMGT and CDR-IMGT (between 26 and 27, between 38 and 39, between 55 and 56, between 65 and 66, between 104 and 105
  • dots are added in all unoccupied positions according to the IMGT unique numbering (for example 33 to 38, 59 to 65, 73, 81,82, 112 to 117 for human IGKV genes)
  • dashes are used for similarity (for the version in which only different amino acids are shown).
  • the figure header is completed with FR1-IMGT, CDR1-IMGT, ...as in the example: Protein display: Human IGK V-REGIONs
Where can I find information about the antibody diversity calculation with clear figures and legends?
You may click on the 6 different links in "IG and TR number of genes: Human". The 3 first links lead to sections in the same page:
  • Total number of IG and TR genes
  • Number of functional IG and TR genes
  • Number of genes in the IMGT genome analysis tools
  • The 3 other links lead to other pages:
    • Potential germline repertoires
    • Questions and answers (IMGT Education): Nomenclature and overview of the human immunoglobulin genes
    • Questions and answers (IMGT Education): Nomenclature and overview of the human T cell receptor genes
There is also some information in French at IMGT Education > Questions and Answers > Gènes et locus (Correction)
Is it possible to retrieve flanking sequence at the 5' and/or 3' ends of IMGT labels that describe IMGT/GENE-DB annotated sequences?
Yes, flanking sequences at the 5' and/or 3' ends of the IMGT labels can be retrieved, in FASTA format, by querying the IMGT/GENE-DB entry section, "Choose your display > IMGT label extraction from IMGT/LIGM-DB reference sequences". For more information: IMGT label extraction from IMGT/LIGM-DB reference sequences.
What are "P" nucleotides in a V-J or V-D-J junction?
"P" nucleotides refer to nucleotides that are found in V-J or V-D-J junctions and that are palindromic to the last (3') nucleotides of the germline V-REGION, to the first (5') nucleotides of the germline J-REGION, or to the (5' or 3) ends of the germline D-REGION of immunoglobulin or T cell receptor genes. "P" nucleotides are only identified in junctions in which the V-REGION, D-REGION or J-REGION has not been submitted to the exonuclease activity (intact ends of the respective V, D, and J coding regions).
Formation of "P" nucleotides:
The "P" nucleotides result from the opening of the DNA hairpin formed during the V-(D)-J rearrangement, when this opening does not occur exactly at the tip of the hairpin ("P" nucleotides can only be identified if this hairpin opening is not followed by exonuclease activity on the sequence ends).
Example:
  1. Cut at the end of the 3' V-REGION or at the 3' D-REGION (during the V-(D)-J rearrangement):
    5'  T - C - A - G
    3'  A - G - T - C
    
  2. Link between G and C forming an hairpin:
    5'  T - C - A - G
    3'  A - G - T - C
    }
  3. Opening of the hairpin. If the cut occurs for instance between A and G on the upper strand:
    5'  T - C
    3'  A- G - T - C - G - A
    
  4. The upper strand is completed. In the final sequence, C - T are designated as "P" nucleotides:
    5'  T - C - A - G - C - T
    3'  A - G - T - C - G - A
    
Lafaille J.J. et al. Cell, 59: 859-870 (1989) PMID:2590942
Lewis S.M. Proc Natl Acad Sci U S A., 91: 1332-1336 (1994) PMID:8108412
How are the CDR lengths defined in IMGT?
The CDR-IMGT length is based on the IMGT unique numbering for V-DOMAIN.
  1. This numbering has been defined following extensive alignment analysis and taking into account the structural data.
    The rules for the IMGT unique numbering are described in: Lefranc, M.-P. et al. "IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains". Dev. Comp. Immunol., 27, 55-77 (2003) PMID:12477501
  2. The same IMGT unique numbering for V-DOMAIN is used whatever the species, whatever the receptor (IG or TR), and whatever the chain (IGH heavy, IGK kappa or IGL lambda, IGI iota in teleostei; TRA alpha, TRB beta, TRG gamma, TRD delta). Moreover, the IMGT unique numbering for V-DOMAIN is used, by extension, for the V-LIKE-DOMAIN of the immunoglobulin superfamily IgSF proteins.
    IMGT unique numbering for all IG and TR V-REGIONs of all species: interest for structure and evolution
    IMGT unique numbering for V-DOMAIN and V-LIKE-DOMAIN
  3. In IMGT/3Dstructure-DB, for example, the CDR-IMGT lengths of 1hzh are, for the VH domain [8.8.20], and for the V-KAPPA domain [7.3.9].
    In the IMGT/3Dstructure-DB card, you can click and see the IMGT Colliers de Perles, on one or two layers. In the IMGT Collier de Perles on two layers, 3D structures experimentally determined hydrogen bonds are represented by green lines.
  4. Correspondence between the Kabat numbering and the IMGT unique numbering are available in PMID:12477501 and on the IMGT site. For example: Correspondence between V numberings
  5. Correspondence between the Chothia structures and the IMGT unique numbering are available on the IMGT site: FR-IMGT and CDR-IMGT lengths (V-REGION and V-DOMAIN)
  6. Practically we only use the IMGT unique numbering (and have been using it for more than 8 years) as it allows to deal remarkably well with the sequences and structures. Moreover we have strong evolutionary data evidence that it is correct!
What are the differences between IgBlast and IMGT/V-QUEST?
The algorithms behind IgBlast and IMGT/V-QUEST are different, and therefore the scores are different. However, the main differences are more on the biological side, and therefore on the interpretation that derives from it:
  1. IMGT/V-QUEST uses the standardized IMGT nomenclature (gene name), which has been approved by the Human Genome (HUGO) Nomenclature Committee (HGNC) in 1999, and entered in LocusLink at NCBI in 2000, and now in Entrez Gene at NCBI (with direct links to IMGT/GENE-DB).
  2. IMGT/V-QUEST uses the standardized amino acid (and codon) numberings, according to the IMGT unique numbering for V-DOMAIN.
  3. IMGT/V-QUEST uses the delimitations of the frameworks (FR-IMGT) and complementarity determining regions (CDR-IMGT), which are identical whatever the receptor type (IG and TR) whatever the chain type and whatever the species. This is quite important when determining the number of IG mutations in the different regions.
  4. At last, but not least, IMGT/V-QUEST provides the information by comparison to all the available alleles. This definition of the alleles represents a huge work but a very valuable one as it allows to take into account the germline diversity.
Is the IMGT gene name valid for both genomic and cDNA sequences?
Yes, the IMGT gene name (CLASSIFICATION concept of IMGT-ONTOLOGY) is valid for both genomic DNA and cDNA sequences. It is also valid for amino acid sequences, and protein 2D (IMGT Colliers de Perles) and 3D structures.
Note that the names of the genes and subgroups depend on the species. For example, in human, TRAV5 is also the subgroup name (as there is only one gene in the subgroup). In contrast, in mouse, TRAV5 is only the subgroup name (as there are several genes in the subgroup).
The lists of the genes per locus and per species are in Gene Tables (IMGT Repertoire for IG and TR).
All the human and mouse IG and TR genes are known. The "Gene tables" lists are therefore comprehensive. All the human and mouse genes are also available in IMGT/GENE-DB. Known IMGT/LIGM-DB cDNA sequences for each gene is available in a section at the bottom of each IMGT/GENE-DB gene entry.
When do we use "V-ALPHA" or "V-BETA"?
V-ALPHA and V-BETA are domain labels, but these labels are also used for the corresponding nucleotide and amino acid sequences. Thus, V-ALPHA refers to the TRA V-J-REGION and V-BETA refers to the TRB V-D-J-REGION (nucleotide or amino acid sequence, protein 2D and 3D structure).
V-ALPHA, V-BETA, V-J-REGION and V-D-J-REGION are written in capital letters as they are IMGT standardized labels (DESCRIPTION concept of IMGT-ONTOLOGY). Labels are independent on the species.
The V-ALPHA and V-BETA amino acid sequences extend from the V-REGION amino acid 1 to the J-REGION most downstream amino acid. The V-ALPHA and V-BETA nucleotide sequences extend from the V-REGION codon 1 to the J-REGION most downstream nucleotide (that is the nucleotide downstream of the most 3' codon).
A V-ALPHA or a V-BETA domain is characterized, as an IG V-DOMAIN, by its CDR-IMGT lengths (written between brackets separared by dots, for example [6.6.10]). Dev. Comp. Immunol., 27, 55-77 (2003) pdf available at http://www.imgt.org/IMGTScientificChart/Numbering/IMGTIGVLsuperfamily.html
The V-ALPHA and V-BETA labels, as the IG V-DOMAIN labels, are important for the representation of the variable domain 3D structures in IMGT/3Dstructure-DB. For instance, in the entry 1ao7, it allows to represent the IMGT Collier de Perles of the V-ALPHA and V-BETA, according to the IMGT unique numbering for V-DOMAIN (NUMEROTATION concept of IMGT-ONTOLOGY).
IMGT Colliers de Perles on one layer
IMGT Colliers de Perles on two layers
How the V-ALPHA and V-BETA of rearranged cDNA sequences, amino acid sequences, protein 2D and 3D structures should be referred to?
The V-ALPHA and V-BETA of rearranged cDNA sequences, amino acid sequences, protein 2D (IMGT Collier de Perles) and 3D structures are referred to in an identical way, that is, species, gene names and CDR-IMGT lenghs.
For instance, taking the example of 1ao7:
  • the V-ALPHA of the 3D structure but also of the corresponding rearranged cDNA sequence, amino acid sequence and protein 2D is referred to as Homo sapiens TRAV12-2-TRAJ24 [6.5.11].
  • the V-BETA of the 3D structure but also of the corresponding rearranged cDNA sequence, amino acid sequence and protein 2D is referred to as Homo sapiens TRBV6-5-TRBD2-TRBJ2-7 [5.6.14].
PDF
Homo sapiens, cDNA or amino acid, or protein: to identify.
V-ALPHA, or V-BETA: to describe.
TRAV12-2-TRAJ24, or TRBV6-5-TRBD2-TRBJ2-7: to classify.
[6.5.11] or [5.6.14]: to number.
Is it possible to study rearrangements of a pseudogene or unusual sequences (such as translocated IG with other genes)?
Yes, it is possible to study rearrangements of pseudogenes or unusual sequences (such as translocated IG with other genes) for the species whose reference sequences are in IMGT/GENE-DB, that is human and mouse. On these sequences, IMGT/V-QUEST does not work, but you can use BLAST2 at CINES on IMGT/GENE-DB reference sequences.
Compare your sequence against IMGT
How to find the correspondence between a "previous" gene name and the current IMGT gene name?
You have two ways to find the correspondences:
  1. either go the "Gene tables" in IMGT Repertoire
    For example, for a human IGHV gene, Gene table: human (Homo sapiens) IGHV in IMGT repertoire and "Find in This page" (Ctrl+F) the "previous" gene name (or its accession number, if known).
  2. or Google "IMGT domain", on the IMGT Home page, with the "previous" gene name (or its accession number, if known), which lead you to "Alignments of alleles"
    For example: Alignment of alleles: human (Homo sapiens) IGHV4-34
    This approach is valid for any old name of genomic genes which are in the IMGT Repertoire (comprehensive for human and mouse).
What defines an allele?
  1. A single nucleotide difference in the coding region (sequence polymorphism) is sufficient to define a new allele. IMGT allele nomenclature for sequence polymorphisms.
  2. A different number of exons (polymophism by insertion/deletion) is also sufficient to define a new allele. For example, Homo sapiens IGHG3*11 and IGHG3*12 have identical genomic sequences for the coding regions they share, but they differ by the number of hinge exons. The IGHG3*11 allele has four hinge (H1, H2, H3, H4) exons, whereas the allele IGHG3*12 allele has three hinge exons (H1, H2, H4).
  3. Can new alleles be submitted from NGS?
    'Putative' alleles identified by NGS from V-(D)-J rearrangements cannot be accepted as new alleles.
    They should be confirmed.
    Authors are strongly encouraged to confirm the nucleotide differences by genomic sequencing of the corresponding germline gene in order to keep an updated IMGT reference directory for the scientific community.
Is it possible to identify IMGT/LIGM-DB sequences associated to a PubMed abstract?
Yes, it is possible, by 'LinkOut' at NCBI, to access IMGT/LIGM-DB sequences associated to a PubMed abstract.
  1. Following a query at NCBI (https://www.ncbi.nlm.nih.gov/) with an accession number (select Nucleotide), click on 'Links', then 'LinkOut', then 'The international ImMunoGeneTics database'.
  2. Following a query at NCBI with an author (select PubMed), click on 'Links', then on 'Nucleotide' (if present), you will obtain the list of GenBank accession numbers associated to the abstract. For each result, click on 'Links', then 'LinkOut', then 'The international ImMunoGeneTics database'.
How to summarize the fact that the FR-IMGT and CDR-IMGT delimitations represent the standard for FR and CDR?
The FR-IMGT and CDR-IMGT delimitations are based on the IMGT® standardization, and more particularly on the IMGT unique numbering.
See also pdf of references 217, 268 and 287
That standardization takes into account the structural data. The CDR-IMGT correspond to the loops of the variable domains. That standardization is used whatever the species, the receptor (immunoglobulin or T cell receptor), and the chain (heavy, kappa, lambda for the IG; alpha, beta, gamma, delta for the TR).
IMGT Colliers de Perles are 2D graphical representations, based on the FR-IMGT and CDR-IMGT delimitations. IMGT Colliers de Perles of antibodies with known 3D structures are available in IMGT/3Dstructure-DB.
Are the IMGT gene names the official ones?
Yes, the IMGT® gene names are the official ones. IMGT® is the international reference in ImMunoGeneTics and has delegation from the HUGO Gene Nomenclature Commiteee HGNC for the IG and TR genes. All the IMGT® gene names for human have been approved by HGNC in 1999 and entered in GDB, and in Entrez Gene at NCBI, with links to IMGT/GENE-DB. The IMGT-NC works in close collaboration with HGNC and is under the aegis of the IUIS.
How can I retrieve the V leader sequences from IMGT reference sequences?
Step 1: Make your selection (species, group, functionality) in IMGT/GENE-DB (access from http://www.imgt.org). For a selection "Homo sapiens", "IGHV" and "functional", the results of your search will be, for example:
Gene-DB result 1

Step 2: Select all genes (click in box at the bottom of the list of resulting genes) and in the "Choose your display" "IMGT label extraction from IMGT/LIGM-DB reference sequences" section, click on "Choose label(s) for extraction" and select the IMGT label "L-PART1+L-PART2" (L-PART1 and L-PART2 being shown as artificially spliced in that query).
Gene-DB result 2

Results for "Nucleotide sequences" will be shown as follows:
>M99641|IGHV1-18*01|Homo sapiens|F|L-PART1+L-PART2|47..92+177..187
atggactggacctggagcatccttttcttggtggcagcaccaacaggtgcccactcc
>X60503|IGHV1-18*02|Homo sapiens|F|L-PART1+L-PART2|1..46+131..141
atggactggacctggagcatccttttcttggtggcagcagcaacaggtgcccactcc
>X07448|IGHV1-2*01|Homo sapiens|F|L-PART1+L-PART2|126..171+258..268
atggactggacctggaggatcctcttcttggtggcagcagccacaggagcccactcc
>X62106|IGHV1-2*02|Homo sapiens|F|L-PART1+L-PART2|21..66+152..162
atggactggacctggaggatcctcttcttggtggcagcagccacaggagcccactcc
>X92208|IGHV1-2*03|Homo sapiens|F|L-PART1+L-PART2|18..63+149..159
atggactggacctggaggatcctcttcttggtggcagcagccacaggagcccactcc
...
Results for "Amino acid sequences" will be shown as follows:
>M99641|IGHV1-18*01|Homo sapiens|F|L-PART1+L-PART2|47..92+177..187
MDWTWSILFLVAAPTGAHS
>X60503|IGHV1-18*02|Homo sapiens|F|L-PART1+L-PART2|1..46+131..141
MDWTWSILFLVAAATGAHS
>X07448|IGHV1-2*01|Homo sapiens|F|L-PART1+L-PART2|126..171+258..268
MDWTWRILFLVAAATGAHS
>X62106|IGHV1-2*02|Homo sapiens|F|L-PART1+L-PART2|21..66+152..162
MDWTWRILFLVAAATGAHS
>X92208|IGHV1-2*03|Homo sapiens|F|L-PART1+L-PART2|18..63+149..159
MDWTWRILFLVAAATGAHS
...
Does IMGT allow one to make multiple alignments and derive consensus sequences?
IMGT has no specific mutiple alignment tools. The curators use BLAST and CLUSTAL. However the priority is always to maintain the gaps according to the IMGT unique numbering. If there is a conflict between the IMGT Protein displays and the BLAST/CLUSTAL results, the gaps are adjusted manually.
How are the positions of gaps and insertions placed in IMGT Collier de Perles?
CDR3-IMGT
In IMGT Collier de Perles, the positions of gaps and insertions are always at the top of the CDR3-IMGT loop. The two CDR3-IMGT anchor positions are Cystein (C) (F strand) at position 104 of FR3-IMGT, and Phenylalanine (F) or Tryptophane (W) (G strand) at position 118 of FR4-IMGT (the F and G strands in 3D structures are antiparallel strands with conserved hydrogen bonds). See IMGT unique numbering for V-DOMAIN and V-LIKE-DOMAIN.
This numbering has been validated by superposition of 3D structures of variable domains with different CDR3-IMGT lengths.
CDR1-IMGT and CDR2-IMGT
For structural data, as recommended in Lefranc M.-P. et al. 2003, PMID:12477501 pdf, gaps may be placed at the top of CDR1-IMGT and CDR2-IMGT loop (as it is done in IMGT/3Dstructure-DB). However, working on sequences it is usually easier to have the gaps at the end of CDR1-IMGT and CDR2-IMGT, as it is done in IMGT/V-QUEST. This allows an easier comparison of CDR-IMGT lengths according to subgroups and avoids to split small CDR in two parts (for example, the CDR2-IMGT of IGKV which have only three amino acids).
Is it possible to search an amino acid sequence against the IMGT reference directory?
You can search an amino acid sequence against the IMGT domain reference directory, using IMGT/DomainGapAlign. You will get the IMGT Collier de Perles of your domain by clicking on "IMGT Collier de Perles" at the bottom of the results page.
For the IG and TR V-DOMAIN, IMGT/DomainGapAlign provides the aligment with the closest V-REGION and the displayed "IMGT Collier de Perles" corresponds to the V-REGION in your sequence. To obtain an IMGT Collier de Perles for the complete V-DOMAIN (V-D-J-REGION or V-J-REGION), you need:
  • to complete your sequence in the window with the CDR3 and the J (at least 9 or 10 amino acids beyond the F or W, respectively of the motif F/WGXG, to get the complete J)
  • to add gaps (if the CDR3-IMGT is <13 amino acids), or additional positions (if the CDR3-IMGT is >13 amino acids).
See IMGT unique numbering for V-DOMAIN and V-LIKE-DOMAIN.
For help on the J sequences, see Alignments of alleles for the IGHJ, IGKJ and IGLJ (in IMGT Repertoire)
How to find IMGT unique numbering for a VH or VL protein domain?
IMGT/DomainDisplay allows to query and to display available domains from the IMGT domain directory (domain amino acid sequences according to the IMGT unique numbering and in the format of IMGT Protein display).
IMGT/DomainGapAlign allows to create gaps in your own amino acid sequence, according to the IMGT unique numbering, for V-REGION or C-DOMAIN, by aligning your sequence with the closest germline V-REGION or with the closest C-DOMAIN (from the IMGT domain directory).
The tool provides the IMGT Collier de Perles of your V-REGION or C-DOMAIN by clicking on "IMGT Collier de Perles" at the bottom of the results page. You can also obtain an IMGT Collier de Perles for the complete IG and TR V-DOMAIN (V-D-J-REGION or V-J-REGION).
How to obtain IMGT/LIGM-DB entries while querying PubMed?
You can obtain IMGT/LIGM-DB entries for IG and TR sequences using LinkOut at NCBI:
  1. Query Pubmed as usual to get publication abstracts http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed (type one accession number, e.g. X97469, or an author, etc..).
  2. On the right of the abstract, click on "Links" and select "Nucleotide". You will get a list of GenBank accession numbers (with their definition) if sequences are described and quoted in the paper.
  3. For each accession number, click on "Links" and select "LinkOut".
In the next page click on "The international ImMunoGeneTics database" to obtain a direct link to the IMGT/LIGM-DB accession number.
See also LinkOut at NCBI in IMGT Index.
Is there information on the frequency and population use of particular V, D and J genes?
There are no direct data on the frequency and population use of particular V, D or G genes, however there are some indicators on their expression, based on the number of sequences found in IMGT/LIGM-DB. This is illustrated by:
  1. the results of IMGT/GeneFrequency,
  2. the IMGT/GENE-DB tables of rearranged annotated cDNA and gDNA for each gene (at the bottom of each entry).
Regarding the alleles, as all known germline sequences are displayed in Alignments of alleles (in IMGT Repertoire), the number of germline sequences for each allele is also an indicator of the possible frequency of the different alleles in the population.
How is delimited the CDR3?
The CDR3 is delimited by (but does not include) the anchor positions 2nd-CYS 104 and J-PHE or J-TRP 118. As for the CDR1 and CDR2 these anchor positions (that belong to the neighbouring FR) are shown as squares in the IMGT Collier de Perles.
The JUNCTION includes 2nd-CYS 104 and J-PHE or J-TRP 118 and is therefore two amino acids longer than the CDR3.
The CDR3 numbering goes from 105 to 117 and, if necessary, gaps or additional positions are added at the top of the loop http://www.imgt.org/IMGTScientificChart/Numbering/IMGTIGVLsuperfamily.html
Note that:
  1. the J-PHE or J-TRP belongs to the characteristic J-REGION motif 'F/W-G-X-G' at positions 118-121
  2. the CDR3 is delimited by the same anchor positions (2nd-CYS 104 and J-PHE or J-TRP 118), whatever the receptor type (IG or TR), the chain type (heavy or light for IG; alpha, beta, gamma or delta for TR) or the species.
How can one appropriately and clearly number amino acids in the constant region of immunoglobulins, for example the LL of IgG1 in position 4 and 5 of CH2 (...APELLGGP...)?
The LL of IgG1 in position 4 and 5 of CH2 (...APELLGGP...) are at positions CH2 1.3 and 1.2 according to the IMGT unique numbering for C-DOMAIN. Lefranc, M.-P. et al., Dev. Comp. Immunol., 29, 185-203 (2005) (approved as standard by the WHO/IUIS committee and WHO/INN).
IGHG1 Alignment of alleles
In IMGT®, the numbering of the amino acids in the IG constant region is given per domain as it is the only way to compare sequences in a standardized way between different isotypes, and this numbering is independent on the variable domain length.
Correspondence, per domain, with the IMGT unique numbering, can be obtained using the IMGT/DomainGapAlign tool.
Correspondence between the IMGT unique numbering for C-DOMAINS, the IMGT exon numbering and the EU and Kabat numberings: Human IGHG
How to find cDNA or rearranged gDNA using a given human or mouse V gene?
You will find cDNA and rearranged gDNA using a given human or mouse gene by querying the IMGT/GENE-DB
  1. In the IMGT/GENE-DB Query page, use SHORT CUT: Select the 'Species' and type the 'IMGT gene name', then 'do the search'.
  2. In the resulting page, select the gene, then 'Do the search'.
  3. At the bottom of the IMGT/GENE-DB entry, you will find links to tables displaying lists of annotated IMGT/LIGM-DB cDNA and rearranged gDNA, using that gene.

You can access the same tables starting from the IMGT/GENE-DB Query page:
Click on 'IMGT/GENE-DB direct links for a given gene' (at the bottom of the Query page), and then in the resulting page select the appropriate format:

Why "Protein displays" and "Colliers de Perles" of variable genes do not include the last strand G?
The strand G is not shown in "Protein displays" and "Colliers de Perles" of variable germline genes as this strand is contributed by the J-REGION that is brought, following the V-J or V-D-J rearrangements.
The alignments of the J genes (that contribute to the G strand) and those of the D genes (for IGH, TRB and TRD) (that contribute to the CDR3-IMGT are available in the IMGT Repertoire
You have to be aware that, at the V-J and V-D-J junctions, there are trimming of the V, (D) and J, and addition at random of N nucleotides to create the diversity of the junctions (see The T cell receptor FactsBook).
To see complete variable domains, you may have a look at the IMGT Colliers de Perles in IMGT/3Dstructure-DB. for example http://www.imgt.org/3Dstructure-DB/cgi/details.cgi?pdbcode=1AO7.
How to retrieve the recombination signals (RS) from the IMGT databases?
To retrieve recombination signals (RS), for instance the human V-RS, from the IMGT® databases:
  1. Query IMGT/GENE-DB
    Example: Species: 'Homo sapiens', Group: 'IGHV', Functionality: 'Functional'
    Do the search
  2. In the result page:
    Select all genes
    then at the bottom of the page in: IMGT label extraction from IMGT/LIGM-DB reference sequences
    Choose label(s) for extraction
    V-RS
You can proceed in the same way for J-RS (Group: IGHD), 5'D-RS and 3'D-RS (Group: IGHD).
You can also proceed in the same way for the other IG and TR loci.
How to find the hinge sequences of IG heavy chains of many species easily?
You can query IMGT/LIGM-DB by selecting in Taxonomy: Loci, genes or chains: 'Ig-heavy' then Do the search
You will get 78786 results (in January 2010). On the same page click on 'Subsequences'. In next page click in the list of labels on 'H' and then on 'Get translated sequences (Fasta)'. You will get 228 sequences.
For hinges encoded by several exons, you need to query individually each exon, that is you have to go back one page and now click in the list of labels on 'H1' and repeat the same operation and again for 'H2', 'H3' and 'H4'.
How to retrieve IGHC hinge sequences?
The IGHC hinge amino acid sequences are displayed at the bottom of the 'IMGT Protein display' IGHC individual pages per species, in IMGT Repertoire. For example for 'Human' http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=human&latin=Homo sapiens&group=IGHC
The correspondence between nucleotide and amino acid sequences is displayed in 'IMGT Alignments of alleles', for individual genes and per species. For example, for the human IGHC genes, links are at the page: http://www.imgt.org/IMGTrepertoire/Proteins/alleles/list_alleles.php?species=Homo sapiens&group=IGHC
You can retrieve nucleotide and amino acid hinge sequences from IGHC genes which are annotated in IMGT/GENE-DB, that is human, mouse, rat and rabbit, and also IGHD trout and Danio rerio:
  1. Query IMGT/GENE-DB
  2. Select :
    • Gene type: constant
    • Group: IGHC
  3. Do the search.
    In the resulting page, you will get a list of 52 genes. Select manually the IGHA, IGHD and IGHG genes of interest.
  4. Click on the radio button of 'Choose label for extraction'. Select H, H1, H2, H3, H4 (together).
  5. Click on the radio button of: 'Amino acid sequences' (see results below after selecting all IGHA, IGHD and IGHG).
    You can do the same kind of query for nucleotide sequences: Click on the radio button of: 'Nucleotide sequences'.
Why is the delimitation of IGKJ2 given as 1094-1132 in V00777 from IMGT/LIGM DB, whereas it is 1096-1134 found from the literature source?
From V00777 from IMGT/LIGM-DB:
J-REGION [740..777]
gtggacgttcggtggaggcaccaagctggaaatcaaac
W  T  F  G  G  G  T  K  L  E  I  K

J-REGION [1094..1132]
tgtacacgttcggaggggggaccaagctggaaataaaac
Y  T  F  G  G  G  T  K  L  E  I  K
  1. The definition of the IMGT label "J-REGION" includes the additional 1 or 2 nucleotides that can be found following the J-HEPTAMER. Indeed these nucleotides can be found in V-(D)-J junctions when the rearrangement occurs without trimming of the J region.
    IMGT label definition:
    J-REGION: coding region of J-GENE (plus 1 or 2 nucleotide(s) after J-HEPTAMER, if present) or corresponding region in cDNA
    
    In genomic sequences J-HEPTAMER and J-REGION are therefore be contiguous (numbers in green below).
  2. As these additional nucleotides are taken into account in IMGT, "codon_start" is added to indicate on which nucleotide (nt), in the IMGT coding label, should start the translation by automatic tools: codon_start=2 means that it is the 2nd nt (the "t" of tgg) for IGKJ1*01 codon_start=3 means that it is the 3rd nt ("t" of tac) for IGKJ2*01. These nucleotides are shown in bold in the sequences above.
    FT J-HEPTAMER 733..739
    FT J-REGION 740..777
    FT /note="functional"
    FT /allele="IGKJ1*01"
    FT /gene="IGKJ1"
    FT /codon_start=2
    FT /translation="WTFGGGTKLEIK"
    
    FT J-HEPTAMER 1087..1093
    FT J-REGION 1094..1132
    FT /note="functional"
    FT /allele="IGKJ2*01"
    FT /gene="IGKJ2"
    FT /codon_start=3
    FT /translation="YTFGGGTKLEIK"
    
How to obtain from IMGT/GENE-DB complete amino acid sequences (artificially spliced) of a constant gene, or of a group of C genes?
  1. To get the complete amino acid sequence (artificially spliced) of a reference constant gene, the query is 13.2
    • http://www.imgt.org/genedb/GENElect?query=13.2+Genesymbol&species=Species
      with for Genesymbol, the gene name (for instance IGHG1), and for Species, the latin name of the species (for instance Homo+sapiens) http://www.imgt.org/genedb/GENElect?query=13.2+IGHG1&species=Homo+sapiens
  2. To get the artificially spliced nucleotide sequence of that gene, the query is 13.1
  3. To get the complete amino acid sequence (artificially spliced) of reference constant genes of a group, the query is 14.2
    • http://www.imgt.org/genedb/GENElect?query=14.2+Group&species=Species
      with for Group, the group name (for instance IGHC), and for Species, the latin name of the species (for instance Homo+sapiens) http://www.imgt.org/genedb/GENElect?query=14.2+IGHC&species=Homo+sapiens
  4. To get the artificially spliced nucleotide sequence of that group, the query is 14.1
The information on the direct links are available at: http://www.imgt.org/genedb/share/textes/GENEDBDirectLinks.html
You can access that page at the bottom of the IMGT/GENE-DB Query page.
How can I get the sequences of the recombination signals?
To get the recombination signals:
  1. query IMGT/GENE-DB
    • Example: Species: Homo sapiens, Group: IGHV, Functionality: Functional
    • Do the search.
  2. In the result page:
    • Select all genes
    • then at the bottom of the page in: 'IMGT label extraction from IMGT/LIGM-DB reference sequences'
    • 'Choose label(s) for extraction':
      • V-RS.
      • For J genes: the label is J-RS
      • For D genes: the labels are 5'D-RS and 3'V-RS.
What does the letter 'S' mean in the temporary IMGT gene nomenclature, for example in IGHV1S1 or IGHJ1S1 ?
The letter S means 'subgroup' for V genes and eventually C genes, and it means 'sequential' for J and D genes: IMGT gene name nomenclature for IG and TR of human and other vertebrates
What are the functionalities of rearranged V-(D)-J genes compared to germline V, D or J genes?
There are two possible functionalities for rearranged V-(D-)J genes, either productive or unproductive.
In contrast, the germline V, D or J genes, and the C genes, have three possible functionalities (as the conventional genes): functional, ORF or pseudogene.
A complete identification of rearranged V-(D-)J genes requires the identification of the two (or three) genes and alleles which are involved in the rearrangement.
This can be determined using IMGT/V-QUEST. IMGT/V-QUEST provides the functionality of the rearranged V-(D)-J gene. It also provides the gene and allele names of the closest V, D (in the IMGT/JunctionAnalysis section) and J genes involved in the rearrangement and their germline functionality.
How to deal with 'complement' sequences?
We can take as an example AL122127. AL122127 in the nucleotide databases 'GEDI' (GenBank, ENA, DDBJ and IMGT/LIGM-DB) is a genomic sequence of 169,802 nucleotides (FASTA in IMGT/LIGM-DB AL122127).
AL122127 contains several constant IGHC genes, including Homo sapiens IGHG1*05 and IGHG3*10, as shown in IMGT/LIGM-DB 'Annotation' (DESCRIPTION section) and the sequence is 'complement' with respect to the orientation of the IGHC genes.
The IMGT labels positions are indicated for the 'complement' sequence (for example, complement(24528..169802) for the IGHG3 gene), however, for a more convivial display, the nucleotide sequence and the translation of each IMGT coding label are displayed in the gene direct orientation.
In 'Alignments of alleles' why is the first nucleotide of the exons in 'pink' color?
In 'Alignments of alleles', the first nucleotide in 5' of an exon is shown in pink when it comes from the preceding exon and contributes to the codon resulting from a splicing frame 1 (codon NNN from sf1) (see Splicing sites) to encode the first amino acid of the translated region.
How to propose new IG or TR gene names to IMGT?
To propose new IG or TR genes to IMGT, the genomic sequence should be publicly available (in generalist databases or in genome assembly).

Information to be provided

The information to be provided comprises, for each gene:
1) the public accession number of the clone from which the sequence was extracted (or of the NCBI assembly positions and version), with the proposed gene name (provisional), positions start and end in that accession number,
2) the corresponding sequence in FASTA format:
- for V genes: from the beginning of the L-PART1 (atg) to the 3'end of the V-RS
- for D genes: from the 5' end of the 5'D-RS to the 3' end of the 3'D-RS
- for J genes: from the 5' end of the J-RS to the 3'end of the J-REGION
- for C genes: from the 5' end of the first exon to the 3' end (stop codon) of the last exon.
3) the proposed functionality (F, ORF, P), and comments (any comment which can be useful, for example, why a gene is considered 'ORF' or 'P')
4) additionally, for V genes,
- the CDR-IMGT lengths (the germline CDR3-IMGT includes, if present, the one or two nucleotides upstream of the V-HEPTAMER)
- the closest human V-REGION gene and allele with score, percentage of identity and alignment length ratio, as provided by IMGT/V-QUEST.
For pseudogenes, the above information obtained using the option 'Search for insertions and deletions in V-REGION', with summary of the out-of-frame defects (for example, '1 del (1), 2 ins (1,2)' for one deletion of 1 nucleotide (nt), 2 insertions of 1 nt and 2nt).

IG and TR subgroup numbers

The IG and TR subgroup numbers are assigned, whenever it is possible, by comparison to the Homo sapiens subgroups (V-REGION nucleotide sequence identity >75% for functional and ORF genes and, for information, CDR-IMGT lengths). This means that some Homo sapiens subgroup numbers may not be represented in some species or conversely that new numbers may be added for subgroups not represented in Homo sapiens.

IG and TR gene numbers.

More information

Lefranc M-P. Immunoglobulin (IG) and T cell receptor genes (TR): IMGT® and the birth and rise of immunoinformatics. Front Immunol. 2014 Feb 05;5:22. doi: 10.3389/fimmu.2014.00022. Open access. PMID:24600447

Does "partial in 3' " or "partial in 5' " make sequences non-functional or is it just that the gene region was not fully sequenced?
"partial in 3' " or "partial in 5' " indicates that the gene region was not fully sequenced.
There are a few sequences that are marked "F" but end in a stop codon. Are these functional?
When there is a high probability that the stop codon will be trimmed and replaced by a N-region during the V-(D)-J rearrangement the gene is considered as functional.
What does 'OR9' means in some Homo sapiens TRBV genes?
'OR9' in the IMGT gene name indicates that this gene is an orphon 'OR', located on chromosome 9.
An orphon is a gene identified in a chromosomal location outside the main loci and which therefore cannot participate to the in vivo IG or TR chain synthesis.
http://www.imgt.org/IMGTindex/orphon.php
http://www.imgt.org/IMGTScientificChart/Nomenclature/IMGTnomenclature.html
In IMGT/V-QUEST, orphons are excluded from 'F+ORF+ in-frame P' (used, for instance, for repertoire analysis) but may be searched if relevant (for instance, for genomic analysis) using the options 'including orphons' in Advanced parameters.
What happens if there is a stop codon in a leader sequence?
A stop codon in the leader sequence makes a gene or an allele a pseudogene. This is the case of the Homo sapiens IGKV1-39*02 allele which is noted as 'P' (pseudogene) in Gene table Homo sapiens IGKV: http://www.imgt.org/IMGTrepertoire/index.php?section=LocusGenes&repertoire=genetable&species=human&group=IGKV Such an allele or gene may be transcribed (and therefore found in 5'RACE/NGS sequencing) but is not translated in vivo.
How to retrieve nucleotide sequence of IG and/or TR genes without any introns?
Nucleotide sequence of IG and/or TR genes without any introns can be retrieved from the 'artificially spliced sets' at the page: http://www.imgt.org/vquest/refseqh.html#VQUEST
Are nucleotide sequence differences in the untranslated EX4 taken into account in the assignment of the TRAC and TRDC alleles?
In the assignment of the TRAC and TRDC alleles [1], nucleotide sequence differences in the untranslated EX4 + 3'UTR (Figure 7 pages 34-35 [2]) (IMGT label EX4UTR), are not considered for the assignment of the TRAC and TRDC alleles.
A given TRAC allele or TRDC allele (for example, Homo sapiens TRAC*01) from different sources may therefore display small differences in the EX4UTR.
More information:
[1] Lefranc M-P. Immunoglobulin (IG) and T cell receptor genes (TR): IMGT® and the birth and rise of immunoinformatics. Front Immunol. 2014 Feb 05;5:22. doi: 10.3389/fimmu.2014.00022. Open access. PMID: 24600447.
[2] Lefranc, M.-P. and Lefranc, G., The T cell receptor FactsBook, Academic Press, 398 pages (2001) ISBN:0124413528.