IMGT/LIGM-DB User Manual

Version 15, February 2004


CONTENTS

1. INTRODUCTION
2. IMGT/LIGM-DB ANNOTATION LEVEL
3. STRUCTURE OF AN ENTRY
4. LINE STRUCTURE
5. INDEX FILE FORMAT


1. INTRODUCTION

This document describes the format and conventions used in IMGT/LIGM-DB. An attempt has been made to make the collected data as easily accessible as possible without restricting their usefulness to a particular type of computing environment. The structure is based on the EMBL nucleotide sequence database flat file.

The database is composed of sequence entries. Each entry corresponds to a single contiguous sequence as contributed or reported in the literature. In many cases, entries have been assembled from several papers reporting overlapping sequence regions. Conversely, a single paper often provides data for several entries, as when homologous sequences from different organisms are compared.

2. IMGT/LIGM-DB ANNOTATION LEVEL

In IMGT/LIGM-DB there are two levels of annotation, initially the keyword information is checked and LIGM standardized keywords are assigned, this represents level K. Standardized keywords are assigned in the order of the keyword tree structure. Subsequent annotation involves the assignment of feature keys to the entries. Then, the annotation level becomes F. The annotation level of each entry is indicated on the first (ID) line of the entry.
Annotation            Definition 
Level
--------------------------------------------------------------------------
- keyword level       entries to which standardized keywords are assigned
- by annotators       sequences annotated by IMGT experts
- automatic           automatically annotated with IMGT tools

3. STRUCTURE OF AN ENTRY

The main emphasis here is to describe the line types to distinguish different kinds of information. Since the IMGT/LIGM-DB representation follows closely that of EMBL, the line types used here are similar.
 
     ID - identification             (begins each entry; 1 per entry)
     AC - accession number           (=1 per entry)
     DT - date                       (2 per entry)
     DE - description                (=1 per entry)
     KW - keyword                    (=1 per entry)
     OS - organism species           (=1 per entry)
     OC - organism classification    (=1 per entry)
     RN - reference number           (=1 per entry)
     RC - reference comment          (=0 per entry)
     RP - reference positions        (=1 per entry)
     RX - reference cross-reference  (=0 per entry)
     RA - reference author(s)        (=1 per entry)
     RT - reference title            (=1 per entry)
     RL - reference location         (=1 per entry)
     DR - database cross-reference   (=0 per entry)
     FH - feature table header       (=1 per entry)
     FT - feature table data         (=0 per entry)
     CC - comments or notes          (=0 per entry)
     XX - spacer line                (many per entry)
     SQ - sequence header            (1 per entry)
     bb - (blanks) sequence data     (=1 per entry)
     // - termination line           (ends each entry; 1 per entry)

 

A sample IMGT/LIGM-DB entry is shown below:





ID   MMTCRGBV1 IMGT/LIGM annotation : by annotators; RNA; ROD; 290 BP.
XX
AC   Z48588;
XX
DT   12-FEB-1996 (Rel. 3, arrived in LIGM-DB )
DT   04-JAN-2000 (Rel. 12, Last updated, Version 4)
XX
DE   M.musculus mRNA for T-cell receptor gammaB-V1 segment. ;
DE   RNA; rearranged configuration; TcR-Gamma; regular; functionality 
DE   productive; group TRGV; subgroup GV1. 
XX
KW   antigen receptor; immunoglobulin superfamily; TcR; TcR gamma-delta; 
KW   TcR-Gamma; variable; IMGT reference sequence; t cell receptor. 
XX
OS   Mus musculus (house mouse)
OC   Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Rodentia; 
OC   Sciurognathi; Muridae; Murinae; Mus. 
XX
RN   [1]
RP   1-290
RX   MEDLINE; 96134008.
RA   Roger T.T., Morisset J., Seman M.;
RT   "Conservation of Tcrg-V5 and limited allelic sequence polymorphism of
RT   the other Tcrg-V genes used by mouse tissue-specific gd-T lymphocytes.";
RL   Immunogenetics 43:165-166(1996).
XX
RN   [2]
RP   1-290
RA   Roger T.T.;
RT   ;
RL   Submitted (03-MAR-1995) to the EMBL/GenBank/DDBJ databases.
RL   Thierry T.R. Roger, Lab. d'immunodifferenciation, Pr Seman, Universite
RL   Denis Diderot, 2, place Jussieu, 75251 Paris cedex 05, France
XX
DR   MGD; MGI:98631; Tcrg-V1
DR   EMBL; Z48588.
XX
FH   Key                 Location/Qualifiers 
FH   
FT   V-REGION            1..290>
FT                       /partial
FT                       /chromosome="13"
FT                       /cell_type="purified skin T cells"
FT                       /strain="L-I (Biozzi mice)"
FT                       /clone_lib="library M13mp19"
FT                       /clone="3.4"
FT                       /allele="TRGV1*08"
FT                       /gene="TRGV1"
FT                       /haplotype="Tcr-gB"
FT                       /tissue_type="skin"
FT                       /CDR_length="[8.6.X]"
FT                       /translation="QLKQTEVSVTRETDESAQISCIASLPDFGNTEIHWYRQKAK
FT                       QFEYLIYVQTNYNQRPLGGKHKKIEASKDFQTSTSTLKINYLKKEDEATYYCAVW
FT                       "
FT   FR1-IMGT            <1..72
FT                       /partial
FT                       /AA_IMGT="3 to 26"
FT                       /translation="QLKQTEVSVTRETDESAQISCIAS"
FT   1st-CYS             61..63
FT   CDR1-IMGT           73..96
FT                       /AA_IMGT="27 to 34"
FT                       /translation="LPDFGNTE"
FT   FR2-IMGT            97..144
FT                       /AA_IMGT="39 to 55, AA 49 missing"
FT                       /translation="IHWYRQKAKQFEYLIY"
FT   CONSERVED-TRP       103..105
FT   CDR2-IMGT           145..162
FT                       /AA_IMGT="56 to 61"
FT                       /translation="VQTNYN"
FT   FR3-IMGT            163..279
FT                       /AA_IMGT="66 to 104"
FT                       /translation="QRPLGGKHKKIEASKDFQTSTSTLKINYLKKEDEATYYC"
FT   2nd-CYS             277..279
FT   CDR3-IMGT           280..290>
FT                       /partial
FT                       /translation="AVW"
FT   JUNCTION            277..290>
FT                       /partial
FT                       /translation="CAVW"
XX
SQ   Sequence 290 BP; 114 A; 62 C; 52 G; 62 T; 0 other;
     cagctaaagc aaactgaagt atccgtcacc agagagacag atgagagtgc gcaaatatcc        60
     tgtatagctt ctcttccaga cttcggcaac acagaaatac actggtaccg gcaaaaagca       120
     aaacagtttg agtatctaat atatgtccaa acaaactaca atcaacgacc cttaggaggg       180
     aagcacaaaa aaattgaagc aagtaaagat tttcaaactt ctacctcaac cttgaaaata       240
     aattacttga agaaagaaga tgaagccacc tactactgtg cagtctggat                  290
//





Some entries will not be the current versions of the sequence represented in EMBL. The user is prompted to notice that although the AC line identifier of the entry is the same as in EMBL, the data represented in the IMGT/LIGM-DB entry are not the same. The differences will appear in the ID line, AC line, DT lines, DE lines, KW lines and the FT lines, all other information being derived from EMBL. Consequently the structure of only these line types will be described here.

4. LINE STRUCTURE

  4.1 The ID (IDentification) line
The ID (IDentification) line is always the first line of an entry. The general form of the ID line is: ID entry name IMGT/LIGM annotation level ; molecule; division; length BP.   4.2 The AC (ACcession number) line
The AC (ACcession number) line lists the accession numbers associated with this entry. For clarity the identifier is kept the same as the EMBL nucleotide sequence database. Since no merging or splitting of entries is done only one accession number is allowed followed by a semicolon.

  4.3 The DT (DaTe) line
There are two DT (DaTe) lines, formatted as follows:

DT   DD-MON-YYYY (Rel. #, Created)
DT   DD-MON-YYYY (Rel. #, Last updated, Version #)
The first DT line indicates the date the entry was created in IMGT/LIGM-DB. The second DT line indicates the last revision of annotation in IMGT/LIGM-DB by an IMGT curator. The version number corresponds to the number of times the entry was validated by IMGT curators.
Note: from September 2000, IMGT/LIGM-DB flat file releases are numbered as YYYYWW-D where YYYY is the number of the year, WW is the number of the week within the year, and D is the number of the day within the week.   4.4 The DE (DEscription) lines
The DE (DEscription) lines contain general descriptive information about the stored sequence - The EMBL description and the IMGT/LIGM-DB assigned description.

 

The IMGT/LIGM-DB assigned description includes information about the entries:

species:
eg:
Homo sapiens
Gorilla gorilla

loci, genes or chains:

 
NameDescription
Ig refers to Immunoglobulin loci, genes, or chains
Ig-Heavy refers to Immunoglobulin Heavy loci, genes, or chains
Ig-Heavy-Alpha refers to Immunoglobulin Heavy Alpha genes or chains
Ig-Heavy-Alpha-1 refers to Immunoglobulin Heavy Alpha-1 genes or chains
Ig-Heavy-Alpha-2 refers to Immunoglobulin Heavy Alpha-2 genes or chains
Ig-Heavy-Delta refers to Immunoglobulin Heavy Delta genes or chains
Ig-Heavy-Epsilon refers to Immunoglobulin Heavy Epsilon genes or chains
Ig-Heavy-Gamma refers to Immunoglobulin Heavy Gamma genes or chains
Ig-Heavy-Gamma-1 refers to Immunoglobulin Heavy Gamma-1 genes or chains
Ig-Heavy-Gamma-2 refers to Immunoglobulin Heavy Gamma-2 genes or chains
Ig-Heavy-Gamma-2-a refers to Immunoglobulin Heavy Gamma-2-a genes or chains
Ig-Heavy-Gamma-2-b refers to Immunoglobulin Heavy Gamma-2-b genes or chains
Ig-Heavy-Gamma-2-c refers to Immunoglobulin Heavy Gamma-2-c genes or chains
Ig-Heavy-Gamma-3 refers to Immunoglobulin Heavy Gamma-3 genes or chains
Ig-Heavy-Gamma-4 refers to Immunoglobulin Heavy Gamma-4 genes or chains
Ig-Heavy-Khi refers to Immunoglobulin Heavy Khi genes or chains (Skate, Xenopus)
Ig-Heavy-Mu refers to Immunoglobulin Heavy Mu genes or chains
Ig-Heavy-Nu refers to Immunoglobulin Heavy Nu genes or chains, also designated as NAR (Nurse Shark)
Ig-Heavy-Omega refers to Immunoglobulin Heavy Omega genes or chains (Shark)
Ig-Heavy-Upsilon refers to Immunoglobulin Heavy Upsilon genes or chains
Ig-Light refers to Immunoglobulin Light loci, genes, or chains
Ig-Light-Iota refers to Immunoglobulin Light Iota loci, genes, or chains
Ig-Light-Kappa refers to Immunoglobulin Light Kappa loci, genes, or chains
Ig-Light-Lambda refers to Immunoglobulin Light Lambda loci, genes, or chains
Ig-Surrogate refers to Immunoglobulin pseudo-light genes or chains of the pre-B cell receptor
Ig-Surrogate-Lambda-5 refers to Immunoglobulin Surrogate Lambda-5 gene or chain of the pre-B cell receptor
Ig-Surrogate-Lambda-like refers to Immunoglobulin Surrogate Lambda-like gene or chain of the pre-B cell receptor
Ig-Surrogate-VpreB refers to Immunoglobulin Surrogate VpreB gene or chain of the pre-B cell receptor
Ig-Surrogate-VpreB-1 refers to Immunoglobulin Surrogate VpreB-1 gene or chain of the pre-B cell receptor
Ig-Surrogate-VpreB-2 refers to Immunoglobulin Surrogate VpreB-2 gene or chain of the pre-B cell receptor
TcR refers to T cell Receptor loci, genes, or chains
TcR-Alpha refers to T cell Receptor Alpha loci, genes, or chains
TcR-Beta refers to T cell Receptor Beta loci, genes, or chains
TcR-Beta-1 refers to T cell Receptor Beta-1 genes or chains
TcR-Beta-2 refers to T cell Receptor Beta-2 genes or chains
TcR-Delta refers to T cell Receptor Delta loci, genes, or chains
TcR-Gamma refers to T cell Receptor Gamma loci, genes, or chains
TcR-Gamma-1 refers to T cell Receptor Gamma-1 genes or chains
TcR-Gamma-2 refers to T cell Receptor Gamma-2 genes or chains
TcR-Gamma-3 refers to T cell Receptor Gamma-3 genes or chains
TcR-Gamma-4 refers to T cell Receptor Gamma-4 genes or chains
TcR-Gamma-5 refers to T cell Receptor Gamma-5 genes or chains
TcR-PreT-Alpha refers to T cell Receptor PreT Alpha genes or chain of the pre-T cell receptor


configuration:

germline        for sequences related to Ig or TcR variable gene, diversity segment, and joining segment
rearranged      for sequences related to Ig or TcR variable gene, diversity segment, and joining segment
unknown         for sequences related to Ig or TcR variable gene, diversity segment, and joining segment
undefined       for sequences related to Ig or TcR constant gene only
functionality:
  1. For "Germline"

  2.  

    The definition of functionality for a germline entity V-GENE, C-GENE,
    J-SEGMENT and D-SEGMENT is based on the sequence analysis.
    
    FUNCTIONAL
    A germline entity (V-GENE, C-GENE, J-SEGMENT or D-SEGMENT) is functional if the coding region has an open reading frame without stop codon, and if there is no described defect in the splicing
        sites, recombination signals and/or regulatory elements. 
    
    ORF (Open Reading Frame)
        A germline entity (V-GENE, C-GENE, J-SEGMENT or D-SEGMENT) is qualified as ORF (Open
        Reading Frame) if the coding region has an open reading frame, but : 
            alterations have been described in the splicing sites, recombination signals and/or regulatory
            elements. 
            and/or changes of conserved amino acids have been suggested by the authors to lead to uncorrect
            folding. 
            and/or the germline entity is an ORPHON.
        A germline J-SEGMENT with an open reading frame and no described defect, but preceding a
        C-GENE which is a pseudogene, is qualified as ORF. 
    
    PSEUDOGENE
        A pseudogene germline entity (V-GENE, C-GENE, J-SEGMENT or D-SEGMENT) is
        characterized by the presence of stop codon(s) and/or frameshift mutation(s). 
        A V-GENE is considered as a pseudogene if these defect occur in the L-PART1 and/or V-EXON, or
        if there is a mutation in the L-PART1 INIT-CODON atg. 
    
    VESTIGIAL (or relics) 
        Defines germline sequences which cannot be assigned to a given subgroup because they are too
        divergent from the other pseudogenes and have too many stop codons and frameshifts.
  3. For "everything except Germline"
PRODUCTIVE
    A rearranged (genomic or cDNA) entity is productive if the Ig or TcR sequence has an open reading
    frame, with no stop codon and no defect described in the initiation codon, splicing sites and/or
    regulatory elements, and an in frame JUNCTION. 

UNPRODUCTIVE
    A rearranged (genomic or cDNA) entity is unproductive if the Ig or TcR sequence is characterized by
    an out_of_frame JUNCTION and/or the presence of stop codon(s) and/or frameshift mutation(s),
    and/or a defect described in the splicing sites and/or the regulatory element(s), and/or unusual features
    (TRANSLOCATED, GENE FUSION...).
structure and localisation:
 chimeric               defines an in vitro or in vivo fusion gene between Ig and/or TcR genes. [2 sources]
 engineered             engineered defines an Ig or TcR gene modified by deliberate mutagenesis in vitro. [1 source]
 gene-fusion            in vitro gene fusion between two or more different genes (at least one of them being Ig or TcR). [1 source (Ig
                        or TcR) + X]
 humanized              humanized defines a natural or synthetic human Ig or TcR gene modified in vitro with non-human recognition
                        site sequences.[1 source (murine, rabbit...) + 1 source (human)]
 orphon                 Ig or TcR gene found in vivo on a different locus from the main locus (either on the same chromosome or on
                        another chromosome), without hallmarks of RNA processing.
 processed              defines an Ig or TcR gene found in vivo on a different locus from the main locus (either on the same
                        chromosome or another chromosome) with hallmarks of RNA processing (spliced regions).
 regular                Ig or TcR gene with no special characteristics regarding its in vivo localisation and with no in vitro
                        modifications.
 scFv                   defines two immunoglobulin (or by extension T cell receptor) V-DOMAINs covalently linked by a short linker 
                        peptide in vitro [1 or 2 sources]
 transgene              transgene Ig or TcR gene artificially introduced into a multicellular organism (mouse, plant...).
 translocated           defines a fused gene resulting from a translocation (in vivo), at least one of the involved loci being Ig or TcR
                        locus. [1 source (Ig or TcR) + X]
 transposed             Defines an Ig or TcR transgene permanently inserted in a chromosome).
 unusual                Defines an Ig or TcR gene with unexpected feature(s) (for instance, insertion of unknown sequences,
                        unexpected rearrangements by inversion...


variable region group:

                        IGHV
                        IGKV
                        IGLV
                        IGL1V
                        IGL2V
                        TRAV
                        TRBV
                        TRDV
                        TRGV
specificity:
             eg:        anti-F(ab')2
                        anti-Fc
                        anti-HIV
                        anti-HIV_1
                        anti-HLA
                        anti-HLA-DQ3
                        anti-HLA-DR
                        anti-CD19
                        anti-CD29
                        anti-CD4
                        anti-CD8

The format of a DE line is:
DE   EMBL description (free text)
DE   species; receptor and chain; nucliec acid type; functionality; structure; 
DE   chain; subgroup; specificity;
  4.5 The KW (KeyWord) lines
The KW (KeyWord) lines provide information which can be used to generate cross-reference indexes of the sequence entries based on functional, structural, or other categories deemed important. The KW (KeyWord) lines provide information which can be used to generate the structure of the entry (see the keyword.doc) and also free text which further describes the entry. keywords are assigned hierarchically, all Immunoglobulins and T cell Receptor sequences having generic terms 'antigen receptor' and 'immunoglobulin superfamily'. The free text following the entry is assigned alphabetically.

  4.6 The FT (Feature Table) lines
The FT (Feature Table) lines provide a mechanism for the annotation of the sequence data. Regions or sites in the sequence which are of interest are listed in the table. In general, the features in the feature table represent signals or other characteristics reported in the cited references. In some cases, ambiguities or features noted in the course of data preparation have been included. The feature table is subject to expansion or change as more becomes known about a given sequence (see the ftable.doc for a more complete description)


5. INDEX FILE FORMAT

The index key of each index file (keywords.ndx, accession.ndx, shordir.ndx etc..) is sorted alphabetically; the names of all entries containing the index key are listed alphabetically after the key. Each value of the index key begins on a new line in column 1, and the associated entry names begin on the next line (except for accession number index where associated information appears on the same line). Lines containing entry names are in fixed-format as follows:
                     Columns   Description
                     -------   ---------------------------
                       14-25   entry name (left-justified)
                       27-29   division code
                       31-40   primary accession number

                       44-55   entry name (left-justified)
                       57-59   division code
                       61-70   primary accession number

Species Index

This file lists all species which appear in the database. It is sorted alphabetically on genus and species. Common names in English will be listed, if present in the database entries. An excerpt from the species index file is given below (the ruler is presented for your convenience - it does not appear in the index file):
1       10        20        30        40        50        60        70        80
+--------+---------+---------+---------+---------+---------+---------+---------+
Artificial gene
             AGGCHIA      SYN K03553      
Bos javanicus
             BOVTCRVB6    MAM L18950      
Bos taurus (cattle)
             BTIGG1HC     MAM X16701      
Caiman crocodylus
             CCIGHVB      VRT M12769       CCIG01       VRT V00146 
+--------+---------+---------+---------+---------+---------+---------+---------+
1       10        20        30        40        50        60        70        80

Keyword Index

This file lists all keywords which appear in the database (on the KW lines). It is sorted alphabetically on keyword. Two entry names fit on each line; if an index key has more than three entries associated with it, additional lines are used (with exactly the same layout). An excerpt from the keyword index file is given below (the ruler is presented for your convenience - it does not appear in the index file):
+--------+---------+---------+---------+---------+---------+---------+---------+
1       10        20        30        40        50        60        70        80
IgD
             HSIGCB9      PRI K01311      
IgG
             MMIGK21      ROD D14728       MMIGGVL      ROD X81463      
IgM
             HSIGCB6      PRI J00259       HSIGCB3      PRI K01307      
             HSIGCB5      PRI K01309       HSIGHZD      PRI L29120      
             HSIGHZG      PRI L29153       HSIGHZH      PRI L29154      
             HSIGLZG      PRI L29156       MMIGHDG      ROD M11699      
             MMIGHDH      ROD M11700       MMIGHDJ      ROD M11702      
             MMIGHDM      ROD M11705       MMIGHDP      ROD M11708      
+--------+---------+---------+---------+---------+---------+---------+---------+
1       10        20        30        40        50        60        70        80

Accession Number Index

This file lists all accession numbers which appear in the database, is sorted alphabetically on accession number. Each accession number is followed by the entry name and entry division in which it occurs. Accession numbers which have been deleted from the database also appear in this index, containing the word DELETED (left-justified) in the entry name field. An excerpt from the accession number index file is given below (the ruler is presented for your convenience - it does not appear in the index file):
1       10        20        30        40        50        60        70        80
+--------+---------+---------+---------+---------+---------+---------+---------+
D01059       HSIGLYM1     PRI
D12725       MMD12725     ROD
D12727       MMD12727     ROD
D12729       MMD12729     ROD
D12733       MMD12733     ROD
D12735       MMD12735     ROD
+--------+---------+---------+---------+---------+---------+---------+---------+
1       10        20        30        40        50        60        70        80

Short Directory Index

This file contains summary information about entries including a brief description of each entry, its sequence length, molecule type and data annotation level. The description line from the original EMBL entry is also included. The file is sorted alphabetically on entry name. The lines are in fixed-format as follows:
   Columns   Field Name          Description
   -------   ---------------     -------------------------------------------
     01-10   entry name          left-justified
     12-12   entry status        + = new at this release
                                 * = updated at this release
                                 blank = unchanged from previous release
     14-14   data class          IMGT/LIGM annotation level 1 to 3
     16-18   molecule type       DNA or RNA
     20-22   division            three-letter division code
     24-29   sequence length     right-justified
     31-80   description         left-justified
If an entry's description cannot fit into columns 31-80, it will be continued onto one or more additional lines. Continuation lines contain description text (left-justified) in columns 31-80; columns 01-30 are blank. An excerpt from the short directory index file is given below (the ruler is presented for your convenience - it does not appear in the index file):
1       10        20        30        40        50        60        70        80
+--------+---------+---------+---------+---------+---------+---------+---------+
AGGCHIA    + 2 RNA SYN    403 Mouse/human Ig active chimeric kappa-chain mRNA
                              (V-J5:mouse/C:  human).
                              rearranged; Ig-Light-Kappa; VKappa; KV6;
BOVTCRVB6  * 3 DNA MAM    415 Bos javanicus T cell receptor gene V-region,
                              exons 1 (3' end) and 2  (5' end).
                              TCR; functional;
BTIGG1HC   * 2 DNA MAM   1830 Bovine Ig germline heavy chain gamma-1-chain gene
                              C-region, 3' end
                              germline; Ig-Heavy; functional;
+--------+---------+---------+---------+---------+---------+---------+---------+
1       10        20        30        40        50        60        70        80

This manual and the database it accompanies may be copied and redistributed freely, without advance permission, provided that this statement is reproduced with each copy.


Last modified: February 2004


Software material and data coming from IMGT server may be used for academic research only, provided that it is referred to IMGT, and cited as "IMGT, the international ImMunoGeneTics database http://imgt.cines.fr:8104 (Initiator and coordinator: Marie-Paule Lefranc, Montpellier, France)." References to cite: Lefranc, M.-P. et al., Nucleic Acids Research, 27, 209-212 (1999); Ruiz, M. et al., Nucleic Acids Research, 28, 219-221 (2000), Lefranc, M.-P., Nucleic Acids Research, 29, 207-209 (2001), Nucleic Acids Res., 31, 370-310 (2003) Full text.

For any other use please contact Marie-Paule Lefranc lefranc@ligm.igh.cnrs.fr.


IMGT initiator and coordinator: Marie-Paule Lefranc (lefranc@ligm.igh.cnrs.fr)
Bioinformatics manager: Véronique Giudicelli (giudi@ligm.igh.cnrs.fr)
Computer manager: Denys Chaume (Denys.Chaume@igh.cnrs.fr)
Interface design: Chantal Ginestoux (chantal@ligm.igh.cnrs.fr)

© Copyright 1995-2004 IMGT, the international ImMunoGeneTics database