IMGT-ONTOLOGY is the first ontology for immunogenetics and immunoinformatics. It provides a semantic specification of the terms to be used in immunogenetics and immunoinformatics and manages the related knowledge, thus allowing the standardization for immunogenetics data from genome, proteome, genetics, two-dimensional (2D) and three-dimensional (3D) structures. IMGT-ONTOLOGY manages the knowledge through diverse facets relying on seven axioms, "IDENTIFICATION", "CLASSIFICATION", "DESCRIPTION", "NUMEROTATION", "LOCALIZATION", "ORIENTATION" and "OBTENTION". These axioms postulate that any object, any process and any relation can be identified, classified, described, numbered, localized and orientated, and the way it is obtained can be characterized. The axioms constitute the Formal IMGT-ONTOLOGY, also designated as IMGT-Kaleidoscope. As the same axioms can be used to generate concepts for multi-scale level approaches, the Formal IMGT-ONTOLOGY represents a paradigm for system biology ontologies, which need to identify, to classify, to describe, to number, to localize and to orientate objects, processes and relations at the molecule, cell, tissue, organ, organism or population levels. IMGT-ONTOLOGY 1.0.0 Marie-Paule Lefranc Véronique Giudicelli D-J-C-sequence 5 true imgt_000015 Identifies, for cDNA, molecule entities with a partially_rearranged D region, a rearranged J region and a C region in undefined configuration. D-J-C-sequence is in partially_rearranged configuration. partially_spliced imgt_000092 Identifies the structure of "Molecular_EntityType" leafconcepts which are transcripts or cDNA sequences that have been submitted to partial RNA processing or splicing. imgt_000110 3 Identifies, for cDNA, molecule entities with a leader L region and a conventional region in undefined configuration. L-nt-sequence is in undefined configuration. true L-nt-sequence Molecule_StructureType The "Molecule_StructureType" concept allows to identify, whatever the molecule type (gDNA, cDNA, mRNA or protein), the structure of Molecule_EntityType leafconcepts. imgt_000082 true Identifies, for cDNA, molecule entities with a germline J region and a C region in undefined configuration. J-C-sequence is in germline configuration. true 4 imgt_000021 J-C-sequence chimeric For example, L-V-D-J-C-sequence of IG resulting from the fusion in vitro of the variable domain from one source (murine, rat, …) with the constant region from human [2 sources]. imgt_000084 Identifies, whatever the molecule type, the structure of "Molecule_EntityType" leafconcepts that have a classical organization and result from the fusion in vivo or in vitro of molecules from two (usually) sources. Identifies complementary DNA, a nucleotide sequence made of A, T, C, G, obtained in vitro by reverse transcription of mRNA. true cDNA imgt_000070 true Identifies a constant (C) gene, a gene that codes the constant region of an IG or of a TR chain. imgt_000053 constant imgt_000031 true 4 Identifies, for mRNA, molecules entities with a leader L region, a rearranged V region and a partially_rearranged D. L-V-D-transcript is in partially_rearranged configuration. L-V-D-transcript The "Molecule_LocationType" concept allows to identify, whatever the molecule type (gDNA, cDNA, mRNA or protein), the Molecule_EntityType leafconcepts based on their location. true Molecule_LocationType imgt_000059 true imgt_000112 Identifies any (coding or not coding) gene other than IG or TR genes with a leader L region (or signal peptide). conventional_with_leader imgt_000025 3 J-transcript true Identifies, for mRNA, molecule entities with a germline J region. J-transcript is in germline configuration. 4 imgt_000030 L-V-D-sequence true Identifies, for cDNA, molecule entities with a leader L region, a rearranged V region and a partially_rearranged D. L-V-D-sequence is in partially_rearranged configuration. diversity true imgt_000055 Identifies a diversity (D) gene, a gene that rearranges at the DNA level and codes the diversity region of the variable domain of an IG or of a TR chain. It comprises six leafconcepts. 'conventional_with_leader' and 'conventional_without_leader' identify any (coding or not coding) gene, with or without leader L region (or signal peptide), respectively, other than the immunoglobulin (IG) or T cell receptor (TR) genes of the vertebrate adaptive immune response. The other four leafconcepts refer to the IG and TR genes and are specific to immunogenetics: three of these leafconcepts, 'variable' (V), 'diversity' (D) and 'joining' (J) identify the IG and TR genes that rearrange at the DNA level in the B and T cells and code the V, D and J regions, respectively, of the IG and TR variable domains; the fourth leafconcept, 'constant' (C), identifies the IG and TR genes that code the C region of the IG and TR chains. imgt_000052 The "GeneType" concept allows to identify the type of gene. true GeneType true EntityType imgt_000007 The "EntityType concept" allows to identify the type of entity. An entity can be a molecule, a cell, a tissue, an organ, an organism or a population. imgt_000027 Identifies, for protein, molecule entities with a leader L region (immature form), a rearranged V region, at least one rearranged D region, a rearranged J region and a C region in undefined configuration. L-V-D-J-C-chain is in rearranged configuration. true L-V-D-J-C-chain 6 imgt_000029 Identifies, for mRNA, molecule entities with a leader L region, a rearranged V region, at least one rearranged D region, a rearranged J region and a C region in undefined configuration. L-V-D-J-C-transcript is in rearranged configuration. 6 L-V-D-J-C-transcript true imgt_000090 Identifies, whatever the molecule type, the structure of "Molecule_EntityType" leafconcepts that have a transmembrane exon or region allowing for a transmembrane chain. membrane true Identifies a sequence made of nucleotides (A, U or T, C, G). imgt_000069 nucleic_acid Identifies genomic DNA, a nucleotide sequence made of A, T, C, G, obtained from a genome, or by extension, synthetic DNA having the characteristics of genomic DNA. imgt_000071 true gDNA true L-AA-chain imgt_000026 3 Identifies, for protein, molecule entities with a leader L region (immature form) and a conventional region in undefined configuration. AA-chain is in undefined configuration. The "LocationType" concept allows to identify the type of location of EntityType leafconcepts. imgt_000058 true LocationType gene Identifies a gDNA sequence unit that can be potentially transcribed and/or translated. This definition includes the coding region or 'region', the regulatory elements in 5' and 3', and the introns, if present. imgt_000077 T_cell_receptor T cell receptors are proteins on the surface membrane of T lymphocytes, capable of specific recognition and binding with a peptide (or "processed antigen") associated with a major histocompatibility complex (MHC) (human leucocyte antigen (HLA) in human). imgt_000067 true imgt_000068 The "MoleculeType" concept allows to identify the type of molecule, based on the type of the constitutive elements and on the concepts of obtention. MoleculeType true imgt_000047 Identifies, whatever the molecule type, the functionality of Molecule_EntityType leafconcepts in undefined or germline configuration, whose coding region has an open reading frame without stop codon, and if there is no described defect in the splicing sites, recombination signals and/or regulatory elements. true functional true imgt_000043 5 Identifies, for protein, molecule entities without a leader L region (mature form) and with a rearranged V region, a rearranged J region and a C region in undefined configuration. V-J-C-chain is in rearranged configuration. V-J-C-chain transgene true imgt_000061 Identifies, whatever the molecule type, a gene that is artificially introduced into a multicellular organism (mouse, plant...). true Identifies, for gDNA, molecule entities with a conventional gene in undefined configuration. conventional-gene is in undefined configuration. imgt_000013 conventional-gene 3 translocated true imgt_000062 Identifies, whatever the molecule type, a gene that results from a translocation (in vivo). imgt_000044 Identifies, for gDNA, molecule entities with a rearranged V gene and a rearranged J gene. V-J-gene is in rearranged configuration. V-J-gene true 4 imgt_000094 Identifies, whatever the molecule type, the structure of "Molecule_EntityType" leafconcepts that have a classical organization without in vivo or in vitro modification. regular L-V-J-C-transcript true imgt_000034 5 Identifies, for mRNA, molecule entities with a leader L region, a rearranged V region, a rearranged J region and a C region in undefined configuration. L-V-J-C-transcript is in rearranged configuration. The "Molecule_FunctionalityType" concept allows to identify, whatever the molecule type (gDNA, cDNA, mRNA or protein), the type of functionality of a Molecule_EntityType leafconcept. true Three leafconcepts, 'functional', 'ORF' (Open Reading Frame) and 'pseudogene', identify the functionality of Molecule_EntityType leafconcepts in undefined configuration (conventional genes and IG and TR constant (C) genes) or in germline configuration (IG and TR variable (V), diversity (D) and joining (J) genes before DNA rearrangements). Two leafconcepts, 'productive' and 'unproductive', identify the functionality of Molecule_EntityType leafconcepts in rearranged or partially_rearranged configuration (IG and TR entities after DNA rearrangements, and by extension fusion entities resulting from translocations, and hybrid entities obtained by biotechnology molecular engineering). Molecule_FunctionalityType imgt_000046 L-V-J-C-chain true 5 imgt_000032 Identifies, for protein, molecule entities with a leader L region (immature form), a rearranged V region, a rearranged J region and a C region in undefined configuration. L-V-J-C-chain is in rearranged configuration; J-C-gene Identifies, for gDNA, molecule entities that are processed (usually orphon) with a germline J gene joined to a C gene in undefined configuration. J-C-gene is in germline configuration. imgt_000020 4 true imgt_000060 orphon true Identifies, whatever the molecule type, a gene that is found in vivo on a different locus from the main locus (either on the same chromosome or on another chromosome). true The "TaxonRank" concept allows to identify the type of taxon in which an object, process or relation is found. The "TaxonRank" concept manages a hierarchy of concepts at various levels of granularity. The corresponding hierarchical taxonomy is that provided by the National Center for Biotechnology Information NCBI (http://www.ncbi.nlm.nih.gov) up to the rank of species and subspecies ("Species" and "Subspecies" concepts, respectively) in order to establish complete interoperability with generalist databases. Since genes of the immunoglobulins (or antibodies) (IG), T cell receptors (TR) and major histocompatibility complex (MHC) are only present in jawed vertebrates (gnathostoma), only vertebrate species were originally represented in IMGT-ONTOLOGY. However, with the extension of IMGT-ONTOLOGY to the immunoglobulin superfamily (IgSF) and MHC superfamily (MhcSF), invertebrate species are incorporated whenever necessary. The "EthnicGroup", "Breed" and "Strain" concepts have been added to IMGT-ONTOLOGY to allow the identification of data specific to ethnic groups for humans (http://www.ebi.ac.uk/imgt/hla/help/ethnic_help.html), breeds for domestic animals or strains for laboratory and wild animals. TaxonRank imgt_000100 Identifies, whatever the molecule type, the functionality of Molecule_EntityType leafconcepts in rearranged or partially_rearranged configuration, whose coding region has an open reading frame without stop codon, if for IG and TR there is an in-frame junction, and if there is no described defect in the initiation codon, splicing sites and/or regulatory elements. productive imgt_000049 true 3 Identifies, for mRNA, molecule entities with a C region in undefined configuration. C-transcript is in undefined configuration. true imgt_000012 C-transcript true 3 J-gene imgt_000023 Identifies, for gDNA, molecule entities with a germline J gene. J-gene is in germline configuration. Identifies the structure of "Molecule_EntityType" chain leafconcepts which are chains with a leader L region (or signal peptide). immature_form imgt_000088 true 3 imgt_000011 Identifies, for cDNA, molecule entities with a C region in undefined configuration. C-sequence is in germline configuration. true C-sequence imgt_000051 unproductive true Identifies, whatever the molecule type, the functionality of Molecule_EntityType leafconcepts in rearranged or partially_rearranged configuration, whose coding region has stop codon(s) and/or frameshift mutation(s), and/or for IG and TR an out-of-frame junction, and/or if a mutation affects the initiation codon, and/or if there are defects in the splicing sites and/or in the regulatory element(s), and/or there are unusual features (translocated, gene fusion...) and/or changes of conserved amino acids demonstrated as leading to uncorrect folding. The "MolecularComponent" concept allows to identify molecular components. true MolecularComponent imgt_000064 Identifies, for cDNA, molecule entities with a conventional region in undefined configuration. nt-sequence is in undefined configuration. nt-sequence true imgt_000037 3 true V-gene 3 imgt_000042 Identifies, for gDNA, molecule entities with a germline V gene. V-gene is in germline configuration. imgt_000079 peptide Identifies a short amino acid sequence unit (made of a small number of amino acids) 5 Identifies, for gDNA, molecule entities with a rearranged V gene, at least one rearranged D gene and a rearranged J gene. V-D-J-gene is in rearranged configuration. imgt_000041 V-D-J-gene true Identifies a joining (J) gene, a gene that rearranges at the DNA level and codes the joining region of the variable domain of an IG or of a TR chain. joining imgt_000056 true Identifies the structure of "Molecule_EntityType" gene leafconcepts that have characteristics for potential alternative splicing (for example, IG genes with features for potential secreted and membrane chains). imgt_000083 alternative_splicing imgt_000016 Identifies, for mRNA, molecule entities with a partially_rearranged D, a rearranged J region and a C region in undefined configuration. D-J-C-transcript is in partially_rearranged configuration. true 5 D-J-C-transcript true Identifies, for mRNA, molecule entities with a leader L region and a germline V region. L-V-transcript is in germline configuration. imgt_000036 3 L-V-transcript conventional true Identifies any (coding or not coding) gene other than IG or TR genes. imgt_000054 true Identifies the structure of "Molecule_EntityType" leafconcepts which are chains without a leader L region (or signal peptide). imgt_000089 mature_form immunoglobulin An antibody or immunoglobulin monomer is formed by two identical light chains and two identical heavy chains. There are five classes of immunoglobulins in human, IgM, IgD, IgG, IgA and IgE, each with distinct heavy chains (mu, delta, gamma, alpha and epsilon, respectively). When expressed at the surface of the B cells, immunoglobulins are anchored by their heavy chains. These membrane immunoglobulins (mIgM, mIgD, mIgG, mIgA and mIgE) are associated with CD79A (Ig-alpha, mb-1) and CD79B (Igbeta, B29). One CD79A-CD79B heterodimer and one immunoglobulin monomer constitute the B cell receptor (BcR). When secreted by the plasmocytes, immunoglobulins are monomeric (IgG, IgD, IgE, and IgA in serum), dimeric (IgA in seromucous secretions), or pentameric (IgM). Immunoglobulins (or antibodies) are proteins capable of specific recognition and binding with an antigen. Antibodies carries antigen-binding sites that bind non-covalently with the corresponding antigen epitope. Antibodies are produced in the body by the B lymphocytes at the cell surface and are secreted by plasma cells, in response to stimulation by antigen. imgt_000065 true imgt_000006 Identifies, whatever the molecule type, the configuration of the conventional genes and that of the IG and TR constant (C) genes, and by extension, the configuration of the Molecule_EntityType leafconcepts that only contain genes in undefined configuration. true undefined true Allows to identify an interbred variant group inside a subspecies or species, for laboratory or wild animals. Strain imgt_000104 The "StructureType" concept allows to identify the type of structure of EntityType leafconcepts. StructureType true imgt_000081 true imgt_000057 variable Identifies a variable (V) gene, a gene that rearranges at the DNA level and codes the variable region of the variable domain of an IG or of a TR chain. Identifies, for mRNA, molecule entities with a conventional region in undefined configuration. nt-transcript is in undefined configuration. nt-transcript 3 true imgt_000038 imgt_000039 V-D-gene Identifies, for gDNA, molecule entities with a rearranged V gene and a partially_rearranged D gene. V-D-gene is in partially_rearranged configuration. 4 true cDNA_sequence Identifies a cDNA sequence unit. imgt_000075 true rearranged Identifies, whatever the molecule type, the configuration of the IG and TR variable (V), diversity (D) and joining (J) genes after DNA rearrangements, and by extension, the configuration of the Molecule_EntityType leafconcepts that contain rearranged genes with, if present, completely rearranged D genes (with or without constant (C) genes in undefined configuration). imgt_000005 Identifies, whatever the molecule type, the configuration of the IG and TR variable (V), diversity (D) and joining (J) genes before DNA rearrangements, and by extension, the configuration of the Molecule_EntityType leafconcepts that contain germline genes (with or without constant (C) genes in undefined configuration). imgt_000003 true germline Identifies, whatever the molecule type, the structure of "Molecule_EntityType" leafconcepts that have an hydrophilic C-terminal exon or region allowing for a secreted soluble chain. secreted imgt_000095 3 imgt_000035 Identifies, for cDNA, molecule entities with a leader L region and a germline V region. L-V-sequence is in germline configuration. true L-V-sequence chain imgt_000076 Identifies an amino acid sequence unit. true imgt_000040 Identifies, for protein, molecule entities without a leader L region (mature form) and with a rearranged V region, at least one rearranged D, a rearranged J region and a C region in undefined configuration. V-D-J-C-chain is in rearranged configuration. V-D-J-C-chain 6 Identifies the structure of "Molecular_EntityType" leafconcepts which are transcripts that cannot be translated in vivo, and corresponding cDNA. For example, for IG or TR, transcripts of V, D or J genes in germline configuration (also designated as 'germline transcripts'), transcripts of C genes in undefined configuration, transcripts of switch regions, and corresponding cDNA, respectively. sterile_transcript imgt_000097 imgt_000048 ORF Identifies, whatever the molecule type, the functionality of Molecule_EntityType leafconcepts in undefined or germline configuration, whose coding region has an open reading frame (ORF), but: * alterations have been described in the splicing sites, recombination signals and/or regulatory elements. * and/or changes of conserved amino acids have been suggested by the authors to lead to uncorrect folding. * and/or the entity is an orphon. true imgt_000004 Identifies, whatever the molecule type, the configuration of partially_rearranged IG or TR diversity (D) genes, and by extension, the configuration of Molecule_EntityType leafconcepts that contain at least one partially_rearranged D gene [with another D or with rearranged variable (V) or joining (J) genes (with or without constant (C) genes in undefined configuration)]. partially_rearranged true Identifies, for gDNA, molecule entities with a partially_rearranged D gene and a rearranged J gene. D-J-gene is in partially_rearranged configuration. true 4 imgt_000017 D-J-gene true Identifies for mRNA molecule entities with a germline D region. D-transcript is in germline configuration. D-transcript imgt_000019 3 true Identifies messenger RNA, a nucleotide sequence made of A, U, C, G, obtained by transcription of gDNA. imgt_000072 mRNA Identifies a sequence made of amino acids, obtained by translation of mRNA, or by in vitro synthesis. imgt_000073 true protein Allows to identify an ethnic group inside the Homo sapiens species. EthnicGroup imgt_000102 true transcript imgt_000080 Identifies a mRNA sequence unit that is transcribed from gDNA and that can be potentially translated. imgt_000091 Identifies the structure of "Molecular_EntityType" leafconcepts which are genes (usually orphons) that have lost part of their introns. partially_processed L-V-D-J-C-sequence 6 Identifies, for cDNA, molecule entities with a leader L region, a rearranged V region, at least one rearranged D region, a rearranged J region and a C region in undefined configuration. L-V-D-J-C-sequence is in rearranged configuration. imgt_000028 true imgt_000103 true Allows to identify a species. Species imgt_000010 3 C-gene true Identifies, for gDNA, molecule entities with a C gene in undefined configuration. C-gene is in undefined configuration. pseudogene imgt_000050 Identifies, whatever the molecule type, the functionality of Molecule_EntityType leafconcepts in undefined or germline configuration, whose coding region has stop codon(s) and/or frameshift mutation(s), and/or if, for a conventional or a leafconcepts that contain V gene, a mutation affects the initiation codon. true imgt_000066 true Major_histocompatibility_complex (MHC) are proteins capable of presenting peptides (or "processed antigen") to the T lymphocytes. The peptides are non-covalently bound in a groove formed by two G (groove) domains. The two domains belong to the same chain (alpha chain of the MHC class I), or to two different chains (alpha and beta chains of the MHC class II). In the MHC class I, the alpha chain is associated to the beta2-microglobulin (B2M). major_histocompatibility_complex Subspecies true imgt_000105 Allows to identify a subspecies. imgt_000086 Identifies, whatever the molecule type, the structure of "Molecule_EntityType" leafconcepts that do not have a classical organization and result from the fusion in vivo or in vitro of molecules from two (or more) different sources [2 (or more) sources]. fusion J-sequence 3 imgt_000024 Identifies, for cDNA, molecule entities with a germline J region. J-sequence is in germline configuration. true Identifies, whatever the molecule type, the structure of "Molecule_EntityType" leafconcepts that have a classical organization and have been modified in vitro with the purpose of humanization. humanized For example, L-V-D-J-C-sequence of IG resulting from the grafting in vitro of the complementarity determining region (CDR) from one source (murine, rat...) to the framework regions (FR) (and with the constant region) from human [2 sources]. imgt_000087 transposed imgt_000063 Identifies, whatever the molecule type, a transgene or a retrotransposon that is permanently inserted in a chromosome. true imgt_000045 FunctionalityType The "FunctionalityType" concept allows to identify the type of functionality of EntityType leafconcepts. true 3 true imgt_000009 Identifies, for protein, molecule entities without a leader L region (mature form) and with a conventional region in undefined configuration. AA-chain is in undefined configuration. AA-chain imgt_000002 The "ConfigurationType" concept allows to identify, whatever the molecule type (gDNA, cDNA, mRNA or protein), the type of configuration of a gene, and by extension, the type of configuration of the Molecule_EntityType leafconcepts that contain it. ConfigurationType true imgt_000093 processed Identifies the structure of "Molecule_EntityType" leafconcepts which are genes (usually orphons) that have lost their introns. The "Molecule_EntityType" concept allows to identify the type of molecule entity. The "Molecule_EntityType" concept is defined by the "MoleculeType", "GeneType" and "ConfigurationType" concepts of identification and has properties identified in the "Molecule_FunctionalityType" and "Molecule_StructureType" concepts. imgt_000008 true Molecule_EntityType There are 38 "Molecule_EntityType" leafconcepts that, based on the "MoleculeType" ('gDNA', 'mRNA', 'cDNA' or 'protein'), identify the molecule entities of the four major "MoleculeUnit" leafconcepts that are genes (10), transcripts (11), cDNA sequences (11) and chains (6) (as indicated by the suffix). Identifies the structure of "Molecular_EntityType" leafconcepts which are transcripts or cDNA sequences that have been submitted to complete RNA processing or splicing. imgt_000096 spliced true imgt_000113 Identifies any (coding or not coding) gene other than IG or TR genes with no leader L region (or signal peptide). conventional_without_leader Breed true imgt_000101 Allows to identify an interbred variant group inside a subspecies or species, for domestic animals. D-sequence Identifies, for cDNA, molecule entities with a germline D region. D-sequence is in germline configuration. true 3 imgt_000018 oligonucleotide imgt_000078 Identifies a short nucleic acid sequence unit . Usually designed in vitro, they are designated as primer, if used in polymerase chain reaction (PCR) amplification or for sequencing. Identifies, for mRNA, molecule entities with a germline J region and a C region in undefined configuration. J-C-transcript is in germline configuration. 4 imgt_000022 J-C-transcript true IDENTIFICATION true The IDENTIFICATION axiom of the Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope postulates that molecules, cells, tissues, organs, organisms or populations, their processes and relations, have to be identified. The IDENTIFICATION axiom has generated the concepts of identification which provide the terms and rules to identify an entity, its processes and its relations. In molecular biology, the concepts of identification allow to identify the molecular components, the molecules, their processes and their relations at the genome, transcriptome, proteome, structure and function levels. imgt_000001 imgt_000014 3 true Identifies, for gDNA, molecule entities with a germline D gene. D-gene is in germline configuration. D-gene unspliced imgt_000099 Identifies the structure of Molecular_EntityType" leafconcepts which are transcripts or cDNA sequences that have not been submitted to RNA processing or splicing. Identifies the structure of "Molecule_EntityType" leafconcepts which are genes (usually orphons) that have all their introns. unprocessed imgt_000098 5 Identifies, for cDNA, molecule entities with a leader L region, a rearranged V region, a rearranged J region and a C region in undefined configuration. L-V-J-C-sequence is in rearranged configuration. L-V-J-C-sequence true imgt_000033 imgt_000085 Identifies, whatever the molecule type, the structure of "Molecule_EntityType" leafconcepts that have been modified by deliberate mutagenesis in vitro [1 source]. engineered imgt_000074 The "MoleculeUnit" concept allows to identify a molecule unit. MoleculeUnit true true 3 L-nt-transcript imgt_000111 Identifies, for mRNA, molecule entities with a leader L region and a conventional region in undefined configuration. L-nt-transcript is in undefined configuration. Domain concept has property in range concept. imgt_000107 _has_ Domain concept is a property for range concept. imgt_000106 _for_ Domain concept is defined by range concept. imgt_000109 is_defined_by Domain concept defines range concept. defines imgt_000108