Where can I find known human IG allotype sequences?
For Gm allotype sequences, the IMGT/LIGM-DB accession numbers of the sequences corresponding to the Gm allotypes are indicated in "Gene table: Human (Homo sapiens) IGHC" in IMGT Repertoire.
The corresponding IGHG allele sequences in FASTA format (per exon) are available from IMGT/GENE-DB.
For Km allotypes, the correspondence between Km alleles and IGKC allele names is available in
"Allotypes: Human IGKC" in IMGT Repertoire.
The corresponding IGKC allele sequences in FASTA format are available from IMGT/GENE-DB.
Why are there differences in the V and J assignments of rearranged human IG and TR sequences, between IMGT/LIGM-DB and the generalist databases GenBank/EMBL/DDBJ, although the flat file accesssion numbers are identical?
IMGT/LIGM-DB provides annotated flat files and uses the official nomenclature of the human immunoglobulin
(IG) and T cell receptor (TR) genes, defined by IMGT and approved by the HUGO Nomenclature committee (HGNC)
in 1999. The official nomenclature is used by GeneCards and Entrez Gene at NCBI:
example of an Entrez Gene
If you use IMGT/V-QUEST to analyse the rearranged IG or TR sequences, you will find the correct gene
and allele assignment.
Citing IMGT/V-QUEST: PMID: 15215425.
How to recover the integrality of the translation of the human immunoglobulin germline sequences?
The following direct links will provide the translation of the human immunoglobulin germline sequences including all known alleles (designated as *01, *02, etc). As the alleles are described at the nucleotide level, two amino acid sequences may be identical.
Are the sequences of the T cell receptor (TR) chains from the Jurkat cell line known?
The sequences of the T cell receptor (TR) chains from the JM/Jurkat cell line
(JM and Jurkat are the same cell line) are known. The accession numbers of the TRBV-(D)-J
and TRAV-J rearrangements are the following:
Where to find the complete sequence of the human TRA/TRD and TRB loci?
The complete sequence of the human TRA/TRD locus is covered by the four accession numbers AE000658-AE000662
which are are adjacent to each other.
the TRA/TRD locus localization (start and end: 127701-1058955 bp) and the gene positions
are those in the clone contig which starts at the beginning of AE000658 and ends at the extremity of AE000662.
The complete sequence of the human TRB locus is contained in the L36092 accession number in IMGT/LIGM-DB.
the TRB locus localization (start and end: 91557-667340 bp) and the gene positions are those in the L36092
accession number. The original L36092 sequence (684973 bp) has been split in EMBL into three sequences of
267156 bp (U66059), 215422 bp (U66060) and 232650 bp (U66061) which overlap, L36092 has become secondary
accession number of U66059, U66060 and U66061. In IMGT, the original unsplit sequence L36092 which is fully
annotated has also been kept as primary accession number, in addition to U66059, U66060 and U66061.
Is there a way to cross-reference the IMGT named TR genes with the more common names and to correlate IMGT sequences with commercially available flow-cytometry antibodies?
In each case, the query can be done by an automatic search on the page with the common name.
Another more general way is to make a search by Google on the IMGT site (available at the
IMGT Home page).
Is it possible to retrieve the human CDR-IMGT from the IMGT® databases?
Yes, it is possible to retrieve the human CDR-IMGT from the IMGT® databases.
For CDR1-IMGT and CDR2-IMGT (and germline CDR3-IMGT) from germline genes:
Example: Species: 'Homo sapiens', Group: 'IGHV', Functionality: 'Functional'
Do the search
In the result page:
Select all genes
then at the bottom of the page in: IMGT label extraction from IMGT/LIGM-DB reference sequences
Choose label(s) for extraction
For instance CDR1-IMGT
You will get nucleotide sequences.
For amino acid sequences, select also below Amino acid sequences
For CDR1-IMGT, CDR2-IMGT and CDR3-IMGT from rearranged sequences
Query IMGT/LIGM-DB in Taxonomy
English name of species: 'human'
Loci, genes or chains: 'Ig-Heavy'
Do the search
Then, on the page with the number of results choose "Subsequences" and, in the window, the label (CDR3-IMGT, for example).
Choose the type of display:
Get subsequences in Fasta format
Get translated subsequences (Fasta).
For CDR1-IMGT, CDR2-IMGT and CDR3-IMGT amino acid sequences from known 3D structures
In Search by Species and Group, Subgroup, Gene or Allele (CLASSIFICATION):
Select: 'Homo sapiens' and then IMGT group: 'IGHV'
For Results, Choose: FR-IMGT or CDR-IMGT sequences: CDR3-IMGT (for example)
Do the search
In the human TRGJP, TRGJP1 and TRGJP2 gene names, what does P stand for and what is the relation between P and P1 and P2?
The letter 'P' stands for Kpn as the rearrangements of TRGJP, TRGJP1 and TRGJP2 are detected by
that enzyme (the first letter K could not be used as indicating kappa).
TRGJP is a unique J gene (and was the first one found) whereas in contrast TRGJP1 and TRGJP2
(found later) are duplicated genes.
What are the rules for designating the constant domains of the immunoglobulin heavy chains?
The constant domains of the immunoglobulin (IG) heavy chains are designated with CH
(CH1, CH2, CH3, CH4, and in teleostei, CH5...).
The same designation is used for nucleotide and amino acid sequences, and 3 D structures.
The designation CH is valid whatever the heavy chain type (mu, delta, gamma, epsilon, alpha) and
whatever the constant gene that encodes the heavy chain constant region (C-REGION).
The assignment to a given gene or chain is indicated by the gene name (or by the chain type name).
IGHM CH1 (or IG heavy mu CH1)
IGHM CH2 (or IG heavy mu CH2)...
IGHD CH1 (or IG heavy delta CH1)
IGHD CH2 (or IG heavy delta CH2)...
IGHG1 CH1 (or IG heavy gamma1 CH1)
IGHG1 CH2 (or IG heavy gamma1 CH2)...
IGHG3 CH1 (or IG heavy gamma 3 CH1)
IGHG3 CH2 (or IG heavy gamma 3 CH2)...
IGHE CH1 (or IG heavy epsilon CH1)...
IGHA1 CH1 (or IG heavy alpha CH1)...
This designation has been approved by the WHO/IUIS Nomenclature Subcommittee for IG and TR.
Human IGHC refers to the group that includes all
the IG heavy constant genes found in humans.
The standardization is useful to compare sequences and 3D structures.
Thus IMGT® detected an error in the b12 sequence (1hzh in PDB and IMGT/3Dstructure-DB),
the only complete human IG crystallized. This is indicated in a note in the
"The presence of an A (Ala) in CH1 121 of 1hzh_H is a PDB file error.
It should be a V (Val) as in 1n0x_H. The sequence of 1hzh_H should be IGHG1*01 100% in its entirety.
This has been confirmed by Ann Hessel and Dennis Burton (21/07/08) in answer to a question
by Marie-Paule Lefranc".
Each constant domain can be represented by a standardized IMGT Collier de Perles using the IMGT
unique numbering for C-DOMAIN.
(the one with the error mentioned above).
Are there, for my teaching, some exercises with answers to illustrate the use of IMGT® for immunoglobulin sequence analysis and 3D structure visualization of immunoglobulins?
You can use:
IMGT/V-QUEST (copying examples that are in the IMGT/V-QUEST Documentation)
IMGT/3Dstructure-DB (querying, for example, b12 as 'Molecule name').
If you want to explore all the possibilities of the IMGT/V-QUEST tool and IMGT/3Dstructure-DB database,
you can easily spend 4 hours on each one, with your students.
Many messages on the immunoglobulin synthesis (See
IMGT Education, for example
Molecular genetics of immunoglobulins),
gene and locus organization (IMGT Repertoire),
2D structures (IMGT Colliers de Perles) and 3D structures (3D visualization Jmol or QuickPDB, contact analysis)
can be conveyed starting from IMGT/V-QUEST and IMGT/3Dstructure-DB, and their respective Documentation.
What is the easiest way to identify the N glycosylation sites of the human germline IGHV, IGKV and IGLV? of the human germline IGHJ, IGKJ and IGLJ?
For the human germline IGHV, IGKV et IGLV genes, the easiest way is to query IMGT/DomainDisplay for:
Species: 'Homo sapiens'
then click on 'Show sequences'.
The N glycosylation sites are indicated with the letter N in green.
The human germline IGHJ, IGKJ et IGLJ genes do not have N glycosylation sites.
Are there T cell receptor haplotypes defined from the human genome sequencing?
The description of T cell receptor haplotypes would be a great step forward.
For the time being only genes that have been sequenced on physical BAC or YAC can be linked and associated.
Trying to collect that information from the generalist databases is a huge task as there is a lot of uncertainty,
the generalist databases having combined different clones to make contigs larger and larger.
Indeed the purpose of the human genome was to have it complete, the pieces of DNA coming from several individuals
were assembled, without (at least public) information on the possible assignment to one individual.
The current human public sequences are therefore 'virtual' (there is no individual with such a sequence).
However human genome sequences from individuals now exist with, first that of
James D. Watson
followed by that of
Note that the page "IMGT/GENE-DB direct links"
lists all the available customizable direct links. This page is referenced at the bottom of the IMGT/GENE-DB query and result pages.
In IMGT/V-QUEST, when obtaining the information 'Nucleotide insertions have been detected and automatically removed...'.What do the insertions (or deletions) mean? Are they artefacts that have been introduced during sequence amplification or sequencing? Are they natural variants?
IMGT/V-QUEST detects insertions (or deletions) which may be either artefacts introduced during sequence amplification
or sequencing, or indels appearing in some clones (for example, chronic lymphocytic leukemia (CLL)).
'Insertions (or deletions)' present in some alleles by comparison with other alleles of the same gene
(alleles defined as polymorphic variants of the gene at the genomic level
in germline configuration, see 'Alignments of alleles') are included in the IMGT reference directory
and therefore are not considered as 'insertions (or deletions)' by IMGT/V-QUEST.
Insertions of amino acids in the CDR1 or CDR2 may have been functionally selected in the rearranged productive domains of antibodies.
However the quality of the sequencing should be carefully checked if there is no information available on the specificity.
Insertions in the FR of productive antibodies are rare, but possible (again sequence should be carefully checked).
The option 'Search for insertion or deletions' was added to IMGT/V-QUEST to answer the demand of clinicians
who need the percentage of somatic hypermutations in the VH as a pronostic factor in CLL.
The 'Search for insertions or deletions' is used by default in IMGT/HighV-QUEST as there is a high frequency
of indels due to homopolymer hybridization in NGS 454 sequencing.