Why are immunoglobulin and T cell receptor genes described as "genes", instead of "gene segments" or "segments"?
Immunoglobulin and T cell receptor genes were accepted as "genes" by the Human Organisation (HUGO) Nomenclature Committee (HGNC) in 1999. It was the only way to have the IG and TR genes entered into the general genome databases (LocusLink, GDB, GeneCards, Entrez Gene) and to define and characterize alleles in a standardized way. This has been accepted as different definitions of a gene coexist in biology. Moreover, each IG or TR gene has its own promotor and can be transcribed as an independent transcript.
Why are abbreviations of IG and TR used, instead of Ig and TcR?
The official nomenclature for the immunoglobulin and T cell receptor genes (approved by HGNC in 1999) starts with the 2 letters "IG" and "TR". IG and TR are therefore used when referring to the genes, loci and chains, whereas Ig and TcR are used for a more general description. The abbreviation TCR should be avoided as, being in capital letters, it creates a confusion with gene names.
Why are immunoglobulin and T cell receptors in "subgroups", rather than in "families"?
A subgroup is part of a group. Both a subgroup and a group are well defined entities, whereas a family is not.
Why is there sometimes capitalization in midsentence?
Capital letters, if in midsentence, indicate sections of the IMGT Repertoire (for example: Alignments of alleles, Tables of alleles, etc.).
Is it judicious to use a L-PART1 oligonucleotide to amplify V-REGIONs?
L-PART1 corresponds to the exon that encodes the first part (the longest one) of the leader. An oligonucleotide in L-PART1 will work well on cDNAs, for the amplification of V-REGIONs. On genomic DNAs, such an oligonucleotide will also amplify the intron between L-PART1 and L-PART2.
See Variable region representation in gDNA and cDNA.
Where can I find known human IG allotype sequences?
For Gm allotype sequences, the IMGT/LIGM-DB accession numbers of the
sequences that correspond to the Gm allotypes are indicated in "Gene tables: Human IGHC" in IMGT Repertoire
The corresponding IGHG allele sequences in FASTA format (per exon) are available from IMGT/GENE-DB.
For Km allotypes, the correspondence between Km alleles and IGKC allele names is available in "Allotypes: Human IGKC" in IMGT Repertoire.
The corresponding IGKC allele sequences in FASTA format are available from IMGT/GENE-DB.
What differences between allotypes and alleles?
The definition of "allotypes" requires that antibody reagents are available to determine the allotypes serologically. If the determination is only done at the sequence level, the polymorphisms have to be described as "alleles".
This does not hinder to establish a correspondence with allotypes if the correspondence allele-allotype has been experimentally proven, or if the individual sequence is identical to a sequence for which it has been demonstrated.
Why are there differences in the V and J assignments of rearranged human IG and TR sequences, between IMGT/LIGM-DB and the generalist databases GenBank/EMBL/DDBJ, although the flat file accesssion numbers are identical?
IMGT/LIGM-DB provides annotated flat files and uses the official nomenclature of the human immunoglobulin (IG) and T cell receptor (TR) genes, defined by IMGT and approved by the HUGO Nomenclature committee (HGNC) in 1999. The official nomenclature is used by GeneCards, LocusLink and Entrez Gene at NCBI:
Example of an Entrez Gene.
The IMGT/V-QUEST tool analyses rearranged IG or TR sequences, provides the correct gene and allele assignment
of the closest Germline genes.
Total number of IG and TR genes
Number of functional IG and TR genes
Number of genes in the IMGT genome analysis tools
The other links lead to other pages:
Potential germline repertoires
Questions and answers (IMGT Education): Nomenclature and overview of the human immunoglobulin genes
Questions and answers (IMGT Education): Nomenclature and overview of the human T cell receptor genes
There is also some information in French at IMGT Education > Questions and Answers >
Gènes et locus
Is it possible to get restriction maps for the IG and TR loci?
The restriction maps of the IG and TR loci are not stored on the IMGT site.
One way to proceed is to go to
This will allow to identify the clones that contain the genes you are looking for
(if you are only interested by a given gene, you can query
IMGT/GeneSearch instead of IMGT/LocusView).
You can then retrieve the sequences containing the genes of interest
(clicking on the clones - in blue - gives access to the entries in IMGT/LIGM-DB, and therefore to the sequences in FASTA format).
You can then analyse the sequences with a tool such as
Which gene to choose, in IMGT/V-QUEST results, when two genes give an identical score?
A look on the IMGT/V-QUEST alignment is useful to check where the differences are between the input sequence and the two germline genes, and eventually to decide which gene to choose.
For the human IGKV genes, if two germline genes, one from the proximal cluster and one from the distal cluster, give an identical score with the input sequence, it is preferable to select the gene of the proximal cluster (as genes of the distal cluster are rarely used).
How to analyse comparison between IG V sequences from a species with human ones?
You can compare:
the IG V gene sequences from a species with the human ones using IMGT/V-QUEST.
the IG V amino acid alignment with the human IGHV, IGKV and IGLV IMGT Protein display (germline sequences) to identify if unusual amino acids found in the analysed sequences are found in human V genes.
the IG V gene sequences from a species with the human IG productively rearranged V sequences (Pommié C. et al 2004
How many of the 19 conserved positions (chemical characteristics in Table 2C and Table 3A of Pommié et al 2004) between human IGHV and IGKV/IGLV are conserved in the analysed sequences?
How many of the 41 conserved positions between human IGKV and IGLV (chemical characteristics in Table 2C and Table 3B- yellow+pink+light green) are conserved in the analysed sequences?
Are the specific positions in human IGKV or IGLV conserved in the analysed sequences? For human IGKV, for instance, the four specific positions are 7: hydroxyl, 24: basic, 86: acidic, and 87: F (Pommié C. et al 2004
the IG V amino acid alignment with the human IGHV, IGKV and IGLV IMGT Protein display (germline sequences) to identify if unusual amino acids found in the analysed sequences are found in human V genes.
How to determine the CDR3-IMGT length of a germline V gene?
The length of the CDR3-IMGT, is expressed in number of amino acids or number of complete codons, following the 2nd-CYS at position 104.
Between the end of the last complete codon in 3' and the V-HEPTAMER, there are frequently one or two nucleotides. This(ese) nucleotide(s) belong(s) to the V-REGION and are taken into account for a nucleotide comparison, but they are not considered in the CDR3-IMGT length.
How to represent CDR-IMGT lengths?
CRD-IMGT lengths should be represented as described in Lefranc et al 2003
PMID: 12477501, that is as numbers separated by dots, between brackets.
For example, [6.3.7] indicates that the CDR1-IMGT has a length of 6 amino acids, the CDR2-IMGT has a length of 3 amino acids and the CDR3-IMGT has a length of 7 amino acids. The same type of representation is used for germline and rearranged genes, and the information (subgroup or gene name, and configuration) should be provided with the CDR-IMGT lengths.
What are the recommendations for correctly representing V-REGION IMGT Protein displays?
Is it possible to retrieve flanking sequence at the 5' and/or 3' ends of IMGT labels that describe IMGT/GENE-DB annotated sequences?
Yes, flanking sequences at the 5' and/or 3' ends of the IMGT labels can be retrieved, in FASTA format,
by querying the IMGT/GENE-DB entry section, "Choose your display > IMGT label extraction
from IMGT/LIGM-DB reference sequences".
For more information: IMGT label extraction
from IMGT/LIGM-DB reference sequences.
What are "P" nucleotides in a V-J or V-D-J junction?
"P" nucleotides refer to nucleotides that are found in V-J or V-D-J junctions and that are
palindromic to the last (3') nucleotides of the germline V-REGION, to the first (5')
nucleotides of the germline J-REGION, or to the (5' or 3) ends of the germline D-REGION of immunoglobulin
or T cell receptor genes. "P" nucleotides are only identified in junctions in which the V-REGION, D-REGION
or J-REGION has not been submitted to the exonuclease activity (intact ends of
the respective V, D, and J coding regions).
Formation of "P" nucleotides:
The "P" nucleotides result from the opening of the DNA hairpin formed during the V-(D)-J rearrangement,
when this opening does not occur exactly at the tip of the hairpin ("P" nucleotides can only be identified
if this hairpin opening is not followed by exonuclease activity on the sequence ends).
Cut at the end of the 3' V-REGION or at the 3' D-REGION (during the V-(D)-J rearrangement):
5' T - C - A - G
3' A - G - T - C
Link between G and C forming an hairpin:
5' T - C - A - G
3' A - G - T - C
Opening of the hairpin. If the cut occurs for instance between A and G on the upper strand:
5' T - C
3' A- G - T - C - G - A
The upper strand is completed. In the final sequence, C - T are designated as "P" nucleotides:
5' T - C - A - G - C - T
3' A - G - T - C - G - A
Lafaille J.J. et al. Cell, 59: 859-870 (1989)
Lewis S.M. Proc Natl Acad Sci U S A., 91: 1332-1336 (1994)
How are the CDR lengths defined in IMGT?
The CDR-IMGT length is based on the IMGT unique numbering for V-DOMAIN.
This numbering has been defined following extensive alignment analysis and taking into account the structural data.
The rules for the IMGT unique numbering are described in:
Lefranc, M.-P. et al. "IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains". Dev. Comp. Immunol., 27, 55-77 (2003)
for example, the CDR-IMGT lengths of 1hzh
are, for the VH domain [8.8.20], and for the V-KAPPA domain [7.3.9].
In the IMGT/3Dstructure-DB card, you can click and see the IMGT Colliers de Perles, on one or two layers.
In the IMGT Collier de Perles on two layers, 3D structures experimentally determined hydrogen bonds are represented by green lines.
Practically we only use the IMGT unique numbering (and have been using it for more than 8 years) as it allows to deal remarkably well with the sequences and structures. Moreover we have strong evolutionary data evidence that it is correct!
What are the differences between IgBlast and IMGT/V-QUEST?
The algorithms behind IgBlast and IMGT/V-QUEST are different, and therefore the scores are different.
However, the main differences are more on the biological side, and therefore on the interpretation that derives from it:
IMGT/V-QUEST uses the standardized IMGT nomenclature (gene name), which has been approved by the Human Genome (HUGO)
Nomenclature Committee (HGNC) in 1999, and entered in LocusLink at NCBI in 2000, and now in Entrez Gene at NCBI (with direct links
IMGT/V-QUEST uses the delimitations of the frameworks (FR-IMGT) and complementarity determining regions (CDR-IMGT), which are
identical whatever the receptor type (IG and TR) whatever the chain type and whatever the species.
This is quite important when determining the number of IG mutations in the different regions.
At last, but not least, IMGT/V-QUEST provides the information by comparison to all the available alleles. This definition of
the alleles represents a huge work but a very valuable one as it allows to take into account the germline diversity.
Is the IMGT gene name valid for both genomic and cDNA sequences?
Yes, the IMGT gene name (CLASSIFICATION concept of IMGT-ONTOLOGY) is valid for both genomic DNA and cDNA sequences. It
is also valid for amino acid sequences, and protein 2D (IMGT Colliers de Perles) and 3D structures.
Note that the names of the genes and subgroups depend on the species. For example, in human, TRAV5 is also the subgroup
name (as there is only one gene in the subgroup). In contrast, in mouse, TRAV5 is only the subgroup name (as there are
several genes in the subgroup).
The lists of the genes per locus and per species are in
Gene Tables (IMGT Repertoire for IG and TR).
All the human and mouse IG and TR genes are known. The "Gene tables" lists are therefore comprehensive. All the human
and mouse genes are also available in IMGT/GENE-DB. Known IMGT/LIGM-DB cDNA sequences for each gene is available in a
section at the bottom of each IMGT/GENE-DB gene entry.
When do we use "V-ALPHA" or "V-BETA"?
V-ALPHA and V-BETA are domain labels, but these labels are also used for the corresponding nucleotide and amino acid
sequences. Thus, V-ALPHA refers to the TRA V-J-REGION and V-BETA refers to the TRB V-D-J-REGION (nucleotide or amino acid
sequence, protein 2D and 3D structure).
V-ALPHA, V-BETA, V-J-REGION and V-D-J-REGION are written in capital letters as they are IMGT standardized labels
(DESCRIPTION concept of IMGT-ONTOLOGY). Labels are independent on the species.
The V-ALPHA and V-BETA amino acid sequences extend from the V-REGION amino acid 1 to the J-REGION most downstream amino
acid. The V-ALPHA and V-BETA nucleotide sequences extend from the V-REGION codon 1 to the J-REGION most downstream
nucleotide (that is the nucleotide downstream of the most 3' codon).
The V-ALPHA and V-BETA labels, as the IG V-DOMAIN labels, are important for the representation of the variable domain 3D
structures in IMGT/3Dstructure-DB. For instance, in the entry 1ao7,
it allows to represent the IMGT Collier de Perles of the V-ALPHA and V-BETA, according to the
IMGT unique numbering for V-DOMAIN (NUMEROTATION concept of IMGT-ONTOLOGY).
How the V-ALPHA and V-BETA of rearranged cDNA sequences, amino acid sequences, protein 2D and 3D structures should be referred to?
The V-ALPHA and V-BETA of rearranged cDNA sequences, amino acid sequences, protein 2D (IMGT Collier de Perles) and 3D
structures are referred to in an identical way, that is, species, gene names and CDR-IMGT lenghs.
For instance, taking the example of 1ao7:
the V-ALPHA of the 3D structure but also of the corresponding rearranged cDNA sequence, amino acid sequence and protein 2D
is referred to as Homo sapiens TRAV12-2-TRAJ24 [6.5.11].
the V-BETA of the 3D structure but also of the corresponding rearranged cDNA sequence, amino acid sequence and protein 2D is
referred to as Homo sapiens TRBV6-5-TRBD2-TRBJ2-7 [5.6.14].
Homo sapiens, cDNA or amino acid, or protein: to identify.
V-ALPHA, or V-BETA: to describe.
TRAV12-2-TRAJ24, or TRBV6-5-TRBD2-TRBJ2-7: to classify.
[6.5.11] or [5.6.14]: to number.
Is it possible to study rearrangements of a pseudogene or unusual sequences (such as translocated IG with other genes)?
Yes, it is possible to study rearrangements of pseudogenes or unusual sequences (such as translocated IG with other genes)
for the species whose reference sequences are in IMGT/GENE-DB, that is human and mouse. On these sequences, IMGT/V-QUEST
does not work, but you can use BLAST2 at CINES on IMGT/GENE-DB reference sequences.
How to find the correspondence between a "previous" gene name and the current IMGT gene name?
You have two ways to find the correspondences:
either go the "Gene tables" in IMGT Repertoire For example, for a human IGHV gene,
Gene table: human (Homo sapiens) IGHV
in IMGT repertoire and "Find in This page" (Ctrl+F) the "previous" gene name (or its accession number, if known).
or Google "IMGT domain", on the IMGT Home page, with the "previous" gene name (or its accession number, if known), which lead you to "Alignments of alleles" For example:
Alignment of alleles: human (Homo sapiens) IGHV4-34
This approach is valid for any old name of genomic genes which are in the IMGT Repertoire (comprehensive for human and mouse).
A different number of exons (polymophism by insertion/deletion) is also sufficient to define a new allele.
For example, Homo sapiens IGHG3*11 and IGHG3*12 have identical genomic sequences for the coding regions they share,
but they differ by the number of hinge exons. The IGHG3*11 allele has four hinge (H1, H2, H3, H4) exons, whereas
the allele IGHG3*12 allele has three hinge exons (H1, H2, H4).
Can new alleles be submitted from NGS?
'Putative' alleles identified by NGS from V-(D)-J rearrangements cannot be accepted as new alleles.
They should be confirmed.
Authors are strongly encouraged to confirm the nucleotide differences by genomic sequencing of the corresponding germline gene in order to keep an updated IMGT reference directory for the scientific community.
Is it possible to identify IMGT/LIGM-DB sequences associated to a PubMed abstract?
Yes, it is possible, by 'LinkOut' at NCBI, to access IMGT/LIGM-DB sequences associated to a PubMed abstract.
Following a query at NCBI (http://www.ncbi.nlm.nih.gov/)
with an accession number (select Nucleotide), click on 'Links', then 'LinkOut', then 'The international ImMunoGeneTics database'.
Following a query at NCBI with an author (select PubMed), click on 'Links', then on 'Nucleotide' (if present),
you will obtain the list of GenBank accession numbers associated to the abstract.
For each result, click on 'Links', then 'LinkOut', then 'The international ImMunoGeneTics database'.
How to summarize the fact that the FR-IMGT and CDR-IMGT delimitations represent the standard for FR and CDR?
The FR-IMGT and CDR-IMGT delimitations are based on the IMGT® standardization, and more particularly
on the IMGT unique numbering.
That standardization takes into account the structural data. The CDR-IMGT correspond to the loops of the variable domains.
That standardization is used whatever the species, the receptor (immunoglobulin or T cell receptor), and the chain (heavy, kappa, lambda for the IG; alpha, beta, gamma, delta for the TR).
IMGT Colliers de Perles are 2D graphical representations, based on the FR-IMGT and CDR-IMGT delimitations. IMGT Colliers de Perles of antibodies with known 3D structures are available in IMGT/3Dstructure-DB.
Are the IMGT gene names the official ones?
Yes, the IMGT® gene names are the official ones.
IMGT® is the international reference in ImMunoGeneTics and has delegation from the
HUGO Gene Nomenclature Commiteee HGNC for the IG and TR genes.
All the IMGT® gene names for human have been approved by HGNC in 1999 and entered in GDB,
and in Entrez Gene at NCBI, with links to IMGT/GENE-DB.
The IMGT-NC works in close collaboration with HGNC
and is under the aegis of the IUIS.
How can I retrieve the V leader sequences from IMGT reference sequences?
Step 1: Make your selection (species, group, functionality) in
(access from http://www.imgt.org).
For a selection "Homo sapiens", "IGHV" and "functional", the results of your search will be, for example:
Step 2: Select all genes (click in box at the bottom of the list of resulting genes)
and in the "Choose your display" "IMGT label extraction from IMGT/LIGM-DB reference sequences" section,
click on "Choose label(s) for extraction" and select the IMGT label "L-PART1+L-PART2"
(L-PART1 and L-PART2 being shown as artificially spliced in that query).
Results for "Nucleotide sequences" will be shown as follows:
Does IMGT allow one to make multiple alignments and derive consensus sequences?
IMGT has no specific mutiple alignment tools.
The curators use BLAST and CLUSTAL.
However the priority is always to maintain the gaps according to the IMGT unique numbering.
If there is a conflict between the IMGT Protein displays and the BLAST/CLUSTAL results,
the gaps are adjusted manually.
How are the positions of gaps and insertions placed in IMGT Collier de Perles?
In IMGT Collier de Perles, the positions of gaps and insertions are always at the top of the CDR3-IMGT loop.
The two CDR3-IMGT anchor positions are Cystein (C) (F strand) at position 104 of FR3-IMGT, and Phenylalanine (F)
or Tryptophane (W) (G strand) at position 118 of FR4-IMGT (the F and G strands in 3D structures are antiparallel
strands with conserved hydrogen bonds). See
IMGT unique numbering for V-DOMAIN and V-LIKE-DOMAIN.
This numbering has been validated by superposition of 3D structures of variable domains with different CDR3-IMGT lengths.
CDR1-IMGT and CDR2-IMGT
For structural data, as recommended in Lefranc M.-P. et al. 2003,
gaps may be placed at the top of CDR1-IMGT and CDR2-IMGT loop
(as it is done in IMGT/3Dstructure-DB).
However, working on sequences it is usually easier to have the gaps at the end of CDR1-IMGT and CDR2-IMGT,
as it is done in IMGT/V-QUEST. This allows an easier comparison of CDR-IMGT lengths according to subgroups
and avoids to split small CDR in two parts (for example, the CDR2-IMGT of IGKV which have only three amino acids).
Is it possible to search an amino acid sequence against the IMGT reference directory?
You can search an amino acid sequence against the IMGT domain reference directory, using
You will get the IMGT Collier de Perles of your domain by clicking on "IMGT Collier de Perles" at the bottom of the results page.
For the IG and TR V-DOMAIN, IMGT/DomainGapAlign provides the aligment with the closest V-REGION and the displayed
"IMGT Collier de Perles" corresponds to the V-REGION in your sequence. To obtain an IMGT Collier de Perles for the
complete V-DOMAIN (V-D-J-REGION or V-J-REGION), you need:
to complete your sequence in the window with the CDR3 and the J (at least 9 or 10 amino acids beyond the
F or W, respectively of the motif F/WGXG, to get the complete J)
to add gaps (if the CDR3-IMGT is <13 amino acids), or additional positions
(if the CDR3-IMGT is >13 amino acids).
How to find IMGT unique numbering for a VH or VL protein domain?
IMGT/DomainDisplay allows to query and to display
available domains from the IMGT domain directory
(domain amino acid sequences according to the IMGT unique numbering and in the format of IMGT Protein display).
IMGT/DomainGapAlign allows to create gaps in your
own amino acid sequence, according to the IMGT unique numbering,
for V-REGION or C-DOMAIN, by aligning your sequence with the closest germline V-REGION or with
the closest C-DOMAIN (from the IMGT domain directory).
The tool provides the IMGT Collier de Perles of your V-REGION or
C-DOMAIN by clicking on "IMGT Collier de Perles"
at the bottom of the results page. You can also obtain an IMGT Collier de Perles for the complete IG and TR V-DOMAIN
(V-D-J-REGION or V-J-REGION).
How to obtain IMGT/LIGM-DB entries while querying PubMed?
You can obtain IMGT/LIGM-DB entries for IG and TR sequences using LinkOut at NCBI:
Is there information on the frequency and population use of particular V, D and J genes?
There are no direct data on the frequency and population use of
particular V, D or G genes, however there are some indicators on their
expression, based on the number of sequences found in IMGT/LIGM-DB.
This is illustrated by:
the IMGT/GENE-DB tables of rearranged annotated cDNA and gDNA for each gene
(at the bottom of each entry).
Regarding the alleles, as all known germline sequences are displayed
in Alignments of alleles (in IMGT Repertoire),
the number of germline sequences for each allele is also an indicator
of the possible frequency of the different alleles in the population.
How is delimited the CDR3?
The CDR3 is delimited by (but does not include) the anchor positions 2nd-CYS 104 and J-PHE or J-TRP 118.
As for the CDR1 and CDR2 these anchor positions (that belong to the neighbouring FR) are shown as squares
in the IMGT Collier de Perles.
The JUNCTION includes 2nd-CYS 104 and J-PHE or J-TRP 118 and is therefore two amino acids longer than the CDR3.
the J-PHE or J-TRP belongs to the characteristic J-REGION motif 'F/W-G-X-G' at positions 118-121
the CDR3 is delimited by the same anchor positions (2nd-CYS 104 and J-PHE or J-TRP 118), whatever
the receptor type (IG or TR), the chain type (heavy or light for IG; alpha, beta, gamma or delta for TR)
or the species.
How can one appropriately and clearly number amino acids in the constant region of immunoglobulins, for example the LL of IgG1 in position 4 and 5 of CH2 (...APELLGGP...)?
The LL of IgG1 in position 4 and 5 of CH2 (...APELLGGP...) are at positions CH2 1.3 and 1.2
according to the IMGT unique numbering for C-DOMAIN.
Lefranc, M.-P. et al., Dev. Comp. Immunol., 29, 185-203 (2005)
(approved as standard by the WHO/IUIS committee and WHO/INN).
In IMGT®, the numbering of the amino acids in the IG constant region is given per domain as it is the
only way to compare sequences in a standardized way between different isotypes, and this numbering
is independent on the variable domain length.
Correspondence, per domain, with the IMGT unique numbering, can be obtained using the IMGT/DomainGapAlign tool.
How to find cDNA or rearranged gDNA using a given human or mouse V gene?
You will find cDNA and rearranged gDNA using a given human or mouse gene by querying the IMGT/GENE-DB
In the IMGT/GENE-DB Query page, use SHORT CUT: Select the 'Species' and type the 'IMGT gene name', then 'do the search'.
In the resulting page, select the gene, then 'Do the search'.
At the bottom of the IMGT/GENE-DB entry, you will find links to tables displaying lists of annotated IMGT/LIGM-DB cDNA and rearranged gDNA, using that gene.
You can access the same tables starting from the IMGT/GENE-DB Query
Click on 'IMGT/GENE-DB direct links for a given gene' (at the bottom of the Query page), and then in the resulting page select the appropriate format:
Why "Protein displays" and "Colliers de Perles" of variable genes do not include the last strand G?
The strand G is not shown in "Protein displays" and "Colliers de Perles" of variable germline genes as this strand is contributed by the J-REGION that is brought, following the V-J or V-D-J rearrangements.
The alignments of the J genes (that contribute to the G strand) and those of the D genes (for IGH, TRB and TRD)
(that contribute to the CDR3-IMGT are available in the IMGT Repertoire
You have to be aware that, at the V-J and V-D-J junctions, there are trimming of the V, (D) and J,
and addition at random of N nucleotides to create the diversity of the junctions
(see The T cell receptor FactsBook).
How to retrieve the recombination signals (RS) from the IMGT® databases?
To retrieve recombination signals (RS), for instance the human V-RS, from the IMGT® databases:
Example: Species: 'Homo sapiens', Group: 'IGHV', Functionality: 'Functional'
Do the search
In the result page:
Select all genes
then at the bottom of the page in: IMGT label extraction from IMGT/LIGM-DB reference sequences
Choose label(s) for extraction
You can proceed in the same way for J-RS (Group: IGHD), 5'D-RS and 3'D-RS (Group: IGHD).
You can also proceed in the same way for the other IG and TR loci.
How to find the hinge sequences of IG heavy chains of many species easily?
You can query IMGT/LIGM-DB by selecting in Taxonomy:
Loci, genes or chains: 'Ig-heavy' then Do the search
You will get 78786 results (in January 2010).
On the same page click on 'Subsequences'.
In next page click in the list of labels on 'H'
and then on 'Get translated sequences (Fasta)'.
You will get 228 sequences.
For hinges encoded by several exons, you need to query individually each exon, that is you have to go back one page
and now click in the list of labels on 'H1'
and repeat the same operation
and again for 'H2', 'H3' and 'H4'.
Do the search.
In the resulting page, you will get a list of 52 genes.
Select manually the IGHA, IGHD and IGHG genes of interest.
Click on the radio button of 'Choose label for extraction'.
Select H, H1, H2, H3, H4 (together).
Click on the radio button of: 'Amino acid sequences'
(see results below after selecting all IGHA, IGHD and IGHG).
You can do the same kind of query for nucleotide sequences:
Click on the radio button of: 'Nucleotide sequences'.
Why is the delimitation of IGKJ2 given as 1094-1132 in V00777 from IMGT/LIGM DB, whereas it is 1096-1134 found from the literature source?
From V00777 from IMGT/LIGM-DB:
W T F G G G T K L E I K
Y T F G G G T K L E I K
The definition of the IMGT label "J-REGION" includes the additional 1 or 2 nucleotides that can be found following the J-HEPTAMER.
Indeed these nucleotides can be found in V-(D)-J junctions when the rearrangement occurs without trimming of the J region.
IMGT label definition:
J-REGION: coding region of J-GENE (plus 1 or 2 nucleotide(s) after J-HEPTAMER, if present) or corresponding region in cDNA
In genomic sequences J-HEPTAMER and J-REGION are therefore be contiguous (numbers in green below).
As these additional nucleotides are taken into account in IMGT, "codon_start" is added to indicate on which nucleotide (nt), in the IMGT coding label, should start the translation by automatic tools:
codon_start=2 means that it is the 2nd nt (the "t" of tgg) for IGKJ1*01
codon_start=3 means that it is the 3rd nt ("t" of tac) for IGKJ2*01.
These nucleotides are shown in bold in the sequences above.
FT J-HEPTAMER 733..739
FT J-REGION 740..777
FT J-HEPTAMER 1087..1093
FT J-REGION 1094..1132
How to obtain from IMGT/GENE-DB complete sequences of a constant gene, or of a group of C genes?
To get the complete amino acid sequence (artificially spliced) of a reference constant gene, the query is 13.2
with for Genesymbol, the gene name (for instance IGHG1), and for Species, the latin name of the species (for instance Homo+sapiens)
To get the artificially spliced nucleotide sequence of that gene, the query is 13.1
To get the complete amino acid sequence (artificially spliced) of reference constant genes of a group, the query is 14.2
with for Group, the group name (for instance IGHC), and for Species, the latin name of the species (for instance Homo+sapiens)
To get the artificially spliced nucleotide sequence of that group, the query is 14.1
The information on the direct links are available at: http://www.imgt.org/genedb/share/textes/GENEDBDirectLinks.html
You can access that page at the bottom of the IMGT/GENE-DB Query page.
How can I get the sequences of the recombination signals?
To get the recombination signals:
Example: Species: Homo sapiens, Group: IGHV, Functionality: Functional
Do the search.
In the result page:
Select all genes
then at the bottom of the page in: 'IMGT label extraction from IMGT/LIGM-DB reference sequences'
'Choose label(s) for extraction':
For J genes: the label is J-RS
For D genes: the labels are 5'D-RS and 3'V-RS.
What does the letter 'S' mean in the temporary IMGT gene nomenclature, for example in IGHV1S1or IGHJ1S1 ?
What are the functionalities of rearranged V-(D)-J genes compared to germline V, D or J genes?
There are two possible functionalities for rearranged V-(D-)J genes, either productive or unproductive.
In contrast, the germline V, D or J genes, and the C genes, have three possible functionalities (as the conventional genes):
functional, ORF or pseudogene.
A complete identification of rearranged V-(D-)J genes requires the identification of the two (or three) genes and alleles which are
involved in the rearrangement.
This can be determined using IMGT/V-QUEST. IMGT/V-QUEST provides the functionality of the rearranged V-(D)-J gene.
It also provides the gene and allele names of the closest V, D (in the IMGT/JunctionAnalysis section) and J genes involved in the rearrangement
and their germline functionality.
How to deal with 'complement' sequences?
We can take as an example AL122127. AL122127 in the nucleotide databases 'GEDI' (GenBank, ENA, DDBJ and IMGT/LIGM-DB) is a genomic sequence
of 169,802 nucleotides (FASTA in IMGT/LIGM-DB AL122127).
AL122127 contains several constant IGHC genes, including Homo sapiens IGHG1*05 and IGHG3*10, as shown in IMGT/LIGM-DB 'Annotation' (DESCRIPTION section)
and the sequence is 'complement' with respect to the orientation of the IGHC genes.
The IMGT labels positions are indicated for the 'complement' sequence (for example, complement(24528..169802) for the IGHG3 gene),
however, for a more convivial display, the nucleotide sequence and the translation of each IMGT coding label are displayed in the gene
In 'Alignments of alleles' why is the first nucleotide of the exons in 'pink' color?
In 'Alignments of alleles', the first nucleotide in 5' of an exon is shown in pink when it comes from the preceding exon and
contributes to the codon resulting from a splicing frame 1 (codon NNN from sf1) (see Splicing sites)
to encode the first amino acid of the translated region.