THE INTERNATIONAL IMMUNOGENETICS INFORMATION SYSTEM®
IMGT/HighV-QUEST version, IMGT/V-QUEST version and IMGT/V-QUEST
reference directory release are indicated at the top of the IMGT/HighV-QUEST
Home page. Be aware that this information should always be
checked. Indeed, for unforeseen delays, the IMGT/HighV-QUEST
portal may not necessarily use the most recent IMGT/V-QUEST version
(IMGT/V-QUEST program versions) and/or IMGT/V-QUEST reference directory (IMGT/V-QUEST reference directory releases).
Citing IMGT/HighV-QUEST:
Alamyar, et al.
IMGT/HighV-QUEST: The IMGT® web portal for immunoglobulin (IG) or antibody
and T cell receptor (TR) analysis from NGS high throughput and deep sequencing.
Immunome Res. 8:1:2 (2012).
LIGM:400
PMID:22647994
Alamyar E., et al., Methods Mol. Biol. 882:569-604 (2012).
PMID:22665256
LIGM:404
Li S., et al.
IMGT/HighV QUEST paradigm for T cell receptor IMGT clonotype, clonal expression
evaluation diversity and next generation repertoire immunoprofiling.
Nat. Commun. 4:2333 (2013).
Open access.
PMID:23995877
LIGM:419
Giudicelli V., et al.,
Autoimmun Infec Dis. 1(1) (2015).
doi:10.16966/aidoa.103.
Free Article
LIGM:448
IMGT/HighV-QUEST [1] (see also [2-5])
is the web portal of IMGT® [6], the
international ImMunoGeneTics information system® (http://www.imgt.org) for the
analysis of rearranged nucleotide sequences of the antigen
receptors (immunoglobulins (IG) or antibodies and T cell receptors
(TR)) [7, 8, 9]
obtained from next generation sequencing (NGS), based on
IMGT-ONTOLOGY [10] and the immunoinformatics
IMGT scientific rules [11].
IMGT/HighV-QUEST
[1, 3, 4]
is the high-throughput version of
IMGT/V-QUEST
[12, 13] and can analyse
thousands of immunoglobulin (IG) and T cell receptor (TR)
rearranged nucleotide sequences (up to 500 000 sequences) per run.
IMGT/HighV-QUEST
uses the
IMGT/V-QUEST
program to analyse user sequences [5] and the
IMGT/V-QUEST
reference directory [6] (IMGT/V-QUEST documentation).
IMGT/HighV-QUEST
outputs have a content similar to that of
IMGT/V-QUEST
[5].
IMGT/HighV-QUEST
statistical analysis is performed on
IMGT/HighV-QUEST
results selected by the user and include the characterization of
the IMGT clonotype (AA) diversity and expression [4]
and their comparison in up to one million sequences.
Note that:
All IMGT/HighV-QUEST users should be
registered in order to be able to use this platform.
IMGT/HighV-QUEST works with
rearranged V-J and V-D-J sequences and germline V-GENE, but does
not work with germline D-GENE or J-GENE.
IMGT/HighV-QUEST can analyse
sequences with DNA insertions or deletions (which do not respect
the IMGT unique numbering). For more information, see 'Search for
insertions and deletions'.
IMGT/HighV-QUEST can analyse
sequences containing two V domains (as scFv) if the option
‘Analysis of single chain Fragment variable (scFv)’ is selected
in Advanced functionalities of the IMGT/HighV-QUEST Search page.
IMGT/HighV-QUEST does not work
with out-of-frame pseudogenes as they cannot be numbered
according to the codon (or amino acid) IMGT unique numbering [14, 15]. You may use
IMGT/BlastSearch (http://www.imgt.org/BlastSearch)
in order to compare out-of-frame sequences against
'F+ORF+inframeP' genes and alleles IMGT reference sequences
(select for 'Database' : 'IMGT/GENE-DB reference sequences').
IMGT/HighV-QUEST does not work,
or will give aberrant results, for too short partial sequences,
sequences containing a cluster of V-GENE, or sequences with too
long 5'UTR or 3'UTR. For these sequences, you may use
IMGT/BlastSearch (http://www.imgt.org/BlastSearch)
in order to identify the closest sequences from IMGT/LIGM-DB
(select for 'Database' : 'IMGT/LIGM-DB') .
IMGT/HighV-QUEST
analysis
IMGT/HighV-QUEST
submission
Once registered and registration accepted by the
IMGT/HighV-QUEST
administrator, users can log in to
IMGT/HighV-QUEST
using their e-mail address and the password they selected during
the registration. The first page that a user may see
after logging in is the
IMGT/HighV-QUEST
Search Page.
- a title for their new analysis (50
characters or less).
and select:
-
the species - the antigen receptor type (IG or TR) or
the locus (for example, IGL) - the path to their local
file to be submitted.
Since IMGT/HighV-QUEST is designed for batches of
large number of sequences, there is no copy/paste submission.
The submitted file should contain IG or TR rearranged nucleotide
sequences in FASTA
format. The file must be formatted in text only (RTF or DOC
formats are not accepted).
Here are some
examples of sequence files in FASTA format : Click here
to get a FASTA file containing human IG sequences
Click here
to get a FASTA file containing human TR sequences
Other sets of sequences to test the IMGT/HighV-QUEST functionalities are available
here.
Users can choose to
receive an e-mail notification :
when the analysis is queued in the local analysis
queue
when the analysis is submitted for computer processing
when the analysis is completed and the results can be
downloaded
5 days before the analysis is expired: 15
days after their availability, the results will be
definitively removed from the server. The expiration
notification is not optional.
Display results
The 'Display results' choice is identical to
that of IMGT/V-QUEST (click here to see this part in IMGT/V-QUEST documentation).
'Detailed view' will provide individual files (one file for each
sequence analysed) as an option for submission < 150 000
sequences (click 'yes' for selecting the option). Individual
files are not provided for submission > 150 000 sequences.
'Files in CSV' will provide outputs with a content
similar to that of the IMGT/V-QUEST
Excel file sheets [9]. The 11 CSV files are
selected by default. They should be kept selected if IMGT/HighV-QUEST statistical analysis is
performed.
Advanced
parameters
The 'Advanced
parameters' selection is identical to that of IMGT/V-QUEST (click here to see this part in IMGT/V-QUEST documentation).
Note that: The option "Search for
insertions and/or deletions" of Advanced parameters is selected
by default.
IMGT/HighV-QUEST uses
the IMGT/V-QUEST reference directory sets [2] to analyse user sequences (click here to see this part in IMGT/V-QUEST documentation).
IMGT/HighV-QUEST
results
Results file format
IMGT/HighV-QUEST results consist of sequence analysis
results in CSV text files. These files are archived together in
a single file in .TXZ
format. This format was chosen because of its common use. Users
may use windows or other operating systems default archive tools
(e.g., 7-zip) to extract (unzip) the files.
Results folders
and files
There are two folders:
- the main folder with 11 (or 12, if scFv was selected) CSV
files
- the individual files folder
Results file
contents
3.1 CSV
files
If all options were selected in the IMGT/HighV-QUEST Search page
(including ‘Analysis of single chain Fragment variable (scFv)’
in Advances Functionalities), the main folder contains 12 CSV
files.
Summary
(always provided in a spreadsheet and in a TXZ archive)
Parameters
(always provided in a spreadsheet and in a txz archive)
scFv
(only present if 'Analysis of single chain Fragment variable
(scFv)' was selected in Advanced functionalities)
The Parameters.txt and Summary.txt files are always provided.
Note that:IMGT/HighV-QUEST statistical
analysis can only be performed on results that include the 11
first CSV files.
The content of the CSV files is the same as that of the IMGT/V-QUEST Excel file sheets (click here
to see this part in IMGT/V-QUEST
documentation). For correspondence between the CSV file content
and the IMGT/V-QUEST online display, see [5, 11]
3.2 Individual files
For submission < 150 000 sequences, the individual files are
provided if the option has been selected by the user in
'Detailed View' of the IMGT/HighV-QUEST
Search page.
One individual file is generated for each sequence analysed.
Each individual file contains analysis results for the options
selected by the user in 'Detailed View' of the IMGT/HighV-QUEST Search page.
On top of each individual file, the version of the IMGT/HighV-QUEST and IMGT/V-QUEST programs and of the IMGT/V-QUEST reference directory release are
recalled.
The results of the individual files are the same as those of the
IMGT/V-QUEST text results (click here
to see this part in IMGT/V-QUEST
documentation).
Note that: IMGT/StatClonotype works only with uploaded files
(.txt format) from the IMGT/HighV-QUEST statistical analysis output ('stats_xxx'
file available in 'data' directory of the IMGT/HighV-QUEST statistical
analysis output where 'xxx' is the batch name and the locus type) as
described IMGT/StatClonotype
documentation. To get these files, you therefore need to launch the
IMGT/HighV-QUEST statistical analysis.
IMGT/HighV-QUEST
statistical analysis terminology
'Single IMGT
V gene and allele name' used in IMGT/HighV-QUEST statistical
analysis
Some V genes and alleles are always
found with identical results using IMGT/V-QUEST or IMGT/HighV-QUEST.
This happens:
when the alleles only differ between them by nucleotide
(nt) differences in 3' of the last codon of the V-REGION taken into
account for the evaluation of the alignment score and closest GENE
and allele identification (according to the IMGT unique numbering,
codon 104 for IGH and the TR loci, codon 109 for IGK and IGI, and
codon 110 for IGL),
when the alleles belong to duplicated genes with identical
nucleotide sequences in the V-REGION analysed by IMGT/V-QUEST or
IMGT/HighV-QUEST.
In order to avoid the filtering-out of these sequences,
the IMGT V gene and allele is assigned to a 'Single IMGT V gene and
allele name' used in IMGT/HighV-QUEST statistical
analysis (click here for the list).
Filtered-in
and filtered-out sequences
Total: sequences of the IMGT/HighV-QUEST output selected by
the user and submitted to the statistical analysis.
'1 copy': sequences in one copy, and therefore different by
their length and/or their sequence, and retained in 'filtered-in'
sequences. For each set of identical sequences, only one
copy is retained in '1 copy' and the other redundant sequences for
that copy are put into 'More than 1'.
The following six categories are excluded from statistical
analysis (filtered-out sequences):
'More than 1': identical sequences (after that one copy of
each one of identical sequences has been retained in '1 copy'). The
'More than 1' are excluded from the per se statistical analysis to
avoid redundancy, the number of 'More than 1' being added to the
corresponding '1 copy' ONLY at the end of statistical analysis.
'No J-GENE': sequences for which IMGT/HighV-QUEST did not
find any J-GENE, usually these sequences are very short in 3'.
'No junction': sequences for which a junction could not be
identified (e.g. no evidence of anchors).
'Warnings': sequences with warnings for the V-REGION
('different CDR lengths' and/or 'id<85%').
In
the Warnings files: 'different CDR lengths' means
sequences with different AA lengths for CDR1-IMGT and/or CDR2-IMGT
compared to the lengths of the CDR1-IMGT and/or CDR2-IMGT,
respectively of the closest identified germline V gene and allele. 'id<85%' means sequences with a V-REGION having a percent
of identity <85% compared to the V-REGION of the closest
identified germline V gene and allele.
'Unknown functionality': sequences for which no functionality
was detected. IMGT/HighV-QUEST is intended for the analysis
of rearranged IG and TR sequences. The functionality identified by
IMGT/HighV-QUEST is either 'productive' (no stop codon and in-frame
junction) or 'unproductive' (stop codons, out-of-frame junction).
The statistical analysis is performed on the '1 copy'
category divided in two sets, depending on the IMGT/HighV-QUEST
results:
'single gene': only one gene identified by IMGT/HighV-QUEST.
'Single gene' refers to V, D and J analysed separately or in
combination.
'several genes': several genes identified by
IMGT/HighV-QUEST. 'Several genes' refers to V, D and J
analysed separately or in combination.
'single allele': only one gene and allele identified by
IMGT/HighV-QUEST. 'Single allele' refers to V, D and J
analysed separately or in combination.
'several alleles (or genes)': several alleles (or genes)
identified by IMGT/HighV-QUEST. 'Several alleles (or
genes)' refers to V, D and J analysed separately or in combination.
Definition of an 'IMGT
clonotype (AA)'
An 'IMGT clonotype (AA)' is defined by a unique V-(D)-J
rearrangement (with IMGT gene and allele names determined by
IMGT/HighV-QUEST at the nucleotide level) and a unique CDR3-IMGT AA
(in-frame) junction sequence.
Sequences
assigned to an IMGT clonotype (AA) comprise:
'single allele' sequences with the same V and J genes and
alleles and same CDR3-IMGT (AA).
'several alleles (or genes)' sequences with the same V and
J genes and alleles among the different identified V and J genes
and alleles and the same CDR3-IMGT (AA).
sequences in 'More than 1' which have their '1 copy' among
the sequences assigned to a given IMGT clonotype (AA).
All sequences assigned to IMGT clonotypes (AA) are
in-frame and have the conserved two anchors C104 and F/W118 (for
example, F for TRB, W for IGH), V and J functional or ORF.
IMGT clonotype (AA) representative sequence: each
IMGT clonotype (AA) has a representative sequence chosen amongst the
assigned sequences, the longest one and with the highest percentage
of identity with the germline V gene and allele.
Definition of an 'IMGT
clonotype (nt)'
An IMGT clonotype (nt) is defined by a unique V-(D)-J
rearrangement (with IMGT gene and allele names determined by
IMGT/HighV-QUEST at the nucleotide level) and a unique CDR3-IMGT nt
(in-frame) junction sequence.
Several IMGT clonotypes nt may correspond to one IMGT
clonotype (AA) if the CDR3-IMGT differ by one or more nucleotides
from the CDR3-IMGT of the representative nucleotide sequence of the
IMGT clonotype (AA).
IMGT/HighV-QUEST
statistical analysis submission
Prerequisite
IMGT/HighV-QUEST
statistical analysis can only be performed on results that include at
least the first 11 (or 12, if scFv) CSV files (they should have been selected in 'Files
in CSV' at the
IMGT/HighV-QUEST
Search page when launching the IMGT/HighV-QUEST analysis as shown in
the screen capture below). For scFv, IMGT/HighV-QUEST statistics are
provided per V-DOMAIN, independently.
Statistical
analysis submission
Users must enter:
- a title for the statistical analysis
- define and select the batch or batches for the
IMGT/HighV-QUEST
statistical analysis. The total of sequences for a single batch is
500,000. The total of sequences for multiple batch is 1,000,000.
- Batch, is selected one by one from the user's 'Analysis
history' page (the receptor type of all selected batches should be on
the same locus)
- a batch ID for each analysis selected.
The statistical analysis can then be launched.
IMGT/HighV-QUEST
statistical analysis output
The IMGT/HighV-QUEST statistical output is provided as a txz file.
After extraction of the archive, open the file "open_to_start.html"
with a Web browser. The
IMGT/HighV-QUEST
statistical output is organized in 6 sections :
The 'Selected parameters'
table recapitulates the parameters selected by the user at the
statistical analysis submission. It provides the title, the species,
the receptor type (or locus), IMGT reference directory set, Search
for insertions/deletions ('yes' or 'no'), Nb of sequences (sum of all
batches), Batch IMGT clonotype comparison ('yes' or 'no'), User
notification (complete and/or expire) (complete: when the analysis is
completed, expire: 15 days after the completion date), Analysis date
(date and time), and Comments (submitted by the user).
The 'Batch list table' displays for each batch, the batch ID (in
orange) and as many lines as the number of IMGT/HighV-QUEST outputs
selected by the user for a given batch. Each line displays the title
of the selected IMGT/HighV-QUEST output, with the Nb of sequences,
Species, Receptor type (or locus), IMGT/HighV-QUEST version,
IMGT/V-QUEST version and IMGT/V-QUEST reference directory release.
Result summary
for batches
The 'Result summary for batches'
table recapitulates, for each batch, the following: - Batch
ID, Total (Nb of sequences in the selected batch) with average length
(Avr Len). - then the different IMGT/HighV-QUEST sequence
categories '1 copy', '1 copy' with indels, 'More than 1', 'More than
1' with indels, No J-GENE, No junction,Warnings, Unknown
functionality, No results, with for each category, the number of
sequences and the average length.
Result
summary for IMGT clonotypes (AA)
The 'Results
summary for IMGT clonotypes (AA) table recapitulates for each batch
(column 1), and then for each locus (column 2) the following: -
Nb of IMGT clonotypes (AA) equal to the Nb of IMGT clonotypes (AA)
representatives sequences), - Nb of sequences assigned to
IMGT clonotypes (AA) (per definition in-frame), Nb of in-frame
sequences not assigned to IMGT clonotypes (AA) (e.g. anchor AA
changes), Nb of in-frame sequences, - Nb of in-frame
productive sequences (no stop codons), Nb of in-frame unproductive
sequences (stop codons), - Nb of out-of-frame sequences, -
Nb of sequences '1 copy' + 'More than 1' (= Nb of analysed
sequences), - Nb of 'single gene' and Nb of 'several genes' -
then Nb of submitted sequences per batch. The percentage of
nb of sequences '1 copy' + 'More than 1' is calculated versus the nb
of submitted sequences. The percentage of all other columns is
calculated versus nb of sequences '1 copy' + 'More than 1'.
4.1
'Results categories' and V, D, J genes and alleles for
genotype analysis ('1 copy' 'single gene' for V and J)
4.1.1
Overview
Results are archived in a single
TXZ
file called (stat_version_1.txz). A TXZ file is provided for each
batch. When extracted, the TXZ file of each batch contains:
5 reports in PDF
1 README file
1 'graphics' folder containing separate copy of graphical
elements in PNG
4.1.2
Content of the IMGT reports
The content of the IMGT reports includes 9 sections:
All sections are found in report 1 whereas reports 2 to 5
contains only part of them.
1_IMGT_report_all.pdf: sections 1 to 9
2_IMGT_report_summary.pdf: sections 1 to 5
3_IMGT_report_1copy_single-gene.pdf: section 6
4_IMGT_report_1copy_several-genes.pdf: section 7
5_IMGT_report_filtered-out_sequences.pdf: sections 8 and 9
Comments Comments are those added by the user in "Batch comments
(optional)". In normal conditions the PDF documents are not
editable, therefore, this functionality was added in IMGT/HighV-QUEST to give users the possibility
to include some optional comments in the final report in order to
be able to recognize it later.
Analysis list 'Analyses list' recapitulates the list of
IMGT/HighV-QUEST sets analysed with title, Nb of sequences, IMGT/HighV-QUEST reference directory (species and
receptor type or locus), IMGT/HighV-QUEST
version, IMGT/V-QUEST version and IMGT/V-QUEST reference directory release.
Note that: IMGT/HighV-QUEST and the IMGT/V-QUEST versions and the IMGT/V-QUEST reference directory release are
important information for statistical analysis. To check the
details of each version and upgrade: IMGT/HighV-QUEST Upgrades
and versions, IMGT/V-QUEST reference
directory releases, IMGT/V-QUEST program
versions pages.
Summary table The Summary table shows the chosen parameters for the
statistical analysis and the categories of sequences as
identified and filtered by IMGT/HighV-QUEST
with Nb of sequences abd sequence average length (nt) for each
category. The IMGT/HighV-QUEST statistical
analysis is performed only on the filtered-in sequences ('1
copy').
Statistical analysis is done on '1 copy'. The 'More
than 1' sequences are aggreagted to the '1 copy' at the end of
the statistical analysis once it has been performed on the '1
copy'. Filtered-out sequences include 'No J-GENE', 'No
JUNCTION', 'Warnings', 'Unknown functionalities' and 'No
results'.
Terminology Same as above. This section helps users understand the
general terminology of the statistical analysis report.
'1 copy' with 'single gene' gene and allele tables and histograms The tables show the IMGT gene and allele name, the
number of '1 copy' (Total), Sequence average length (in nb of
nt), V-REGION average length (in nb of nt), 'id=100%' which
represents the number of sequences with an identity percent of
100% by comparison with the germline. The colored lines
(green: V, red: D, yellow: J) display the results per gene. For
each gene, the results are then displayed per allele (white
lines), with the indication of the functionality of the germline
allele and for id=100%, and between parentheses, percent of these
sequences by comparison to 'Total'. The functionality
of an allele can be:
F: Functional
P: Pseudogene
ORF: Open Reading Frame
The functionality is shown between parentheses, (F) and (P),
when the corresponding germline gene has not yet been isolated.
It is shown between brackets, [F] and [P], when it is not known
if the sequence is germline or rearranged.
Histograms display the
number of '1 copy' with 'single gene' for each V, (D) and J
genes. Genes are shown according to their position from 5' to 3'
in the concerned locus. Unmapped genes are located at the top of
the histograms.
'1 copy' with 'several genes' gene and allele tables These tables have the same header and the same type of
results as '6'. There are as many different lines as
different results, as proposed by IMGT/HighV-QUEST, at the gene
(colored lines) and/or allele level (white lines).
These results are usually obtained for short sequences which do
not allow the assignment by IMGT/HighV-QUEST to a single gene or
single allele.
Sequences in 'More than 1' Sequences in 'More than 1' (violet-blue lines) are
shown below each corresponding '1 copy' (green lines). The 'More
than 1' are excluded from the per se statistical analysis to
avoid redundancy, the number of 'More than 1' being added to the
corresponding '1 copy' ONLY at the end of statistical analysis.
Other filtered-out sequences All other filtered-out sequences ('No J-GENE', 'No
junction', 'Warnings', 'Unknown functionality', and 'No results')
are provided in separate similar tables, with the 'Sequence
number' and the 'Sequence ID'.
4.2
IMGT clonotype (AA and nt) results per locus
Overview IMGT clonotype (AA and nt) results per locus are
provided in 10 sections (HTML pages):
The Nb of IMGT clonotypes (AA) is given by the total number of ID
(last line in column #). The 'IMGT clonotypes (AA) per
Nb' provides for each IMGT clonotype (AA):
ID, nb (#) and experimental ID (Exp. ID).
Nb, Nb of '1 copy', Nb of 'More than 1' and Total (=nb
of sequences assigned to that IMGT clonotype (AA)).
IMGT clonotype (AA) definition: V, D and J genes and
alleles, CDR3-IMGT length (AA), CDR3-IMGT sequences (AA),
Anchors 104, 118.
IMGT clonotype (AA) representative sequence: V %
(percentage of identity of the V-REGION compared to the closest
germline V gene), Sequence length, Functionnality (as identified
by IMGT/HighV-QUEST), Sequence ID with link
to the file of the representative sequence.
IMGT clonotypes (nt): Sequences file ('1 copy') with a
link to each 'Sequences file' containing the '1 copy' sequences
in FASTA format assigned to a given IMGT clonotype (AA).
In this table, the results are sorted by decreasing nb of '1 copy'
and then by decreasing nb of 'More than 1'.
4.2.2 IMGT clonotypes (AA) per
Nb with detailed clonotypes (nt)
The Nb of IMGT clonotypes (AA) is given by the total number of ID
(last line in column #). The same information, as in IMGT
clonotypes (AA) per Nb, is provided for each IMGT clonotype (AA)
and displayed in the pink line, with under each clonotype (AA),
the corresponding IMGT clonotype(s) (nt) displayed on separate
lines.
The following information is provided:
ID nb (the same as that of the IMGT clonotype (AA)) (#),
CDR3-IMGT length (nt),
nb of different CDR3-IMGT (nt),
and then, for each CDR3-IMGT (nt):
CDR3-IMGT sequence (nt),
nb of different nt in the CDR3 (compared to the sequence
(nt) of the IMGT clonotype (AA)),
V gene and allele, D gene and allele, J gene and allele,
Anchors 104,118: 'C,F' or 'C,W',
V-REGION identity % mean, V-REGION length mean,
J-REGION identity % mean, J-REGION length mean,
Sequence length mean,
nb of '1 copy', nb of 'More than 1' and Total (=nb of
sequences assigned to that IMGT clonotype (nt) (the 3 columns on
the right)). For a given IMGT clonotype AA the sum of '1 copy',
'More than 1', and Total of the IMGT clonotypes (nt) is equal
respectively to those of the IMGT clonotype (AA) (boxes 3, 4, 5
of the pink line).
4.2.3 IMGT clonotypes (AA) per V gene
The same information, as IMGT clonotypes (AA) per Nb, is
provided but sorted here alphabetically by V gene and allele
name.
4.2.4 IMGT clonotypes (AA)
per V gene with detailed clonotypes (nt)
The same information, as IMGT clonotypes (AA) per Nb with
detailed clonotypes (nt), is provided but sorted here
alphabetically by V gene and allele name. For a given
IMGT clonotype AA, the sum of '1 copy', 'More than 1', and Total
of the IMGT clonotypes (nt) is equal respectively to the pink
boxes before sequence file.
4.2.5 IMGT clonotypes (AA) per CDR3-IMGT length (AA)
The same information, as IMGT clonotypes (AA) per Nb, is provided
but sorted here by CDR3-IMGT length. This display allows
identification of sequences assigned to different IMGT clonotypes
(AA) whereas, most probably, they represent a single IMGT
clonotype (AA).
4.2.6 IMGT clonotypes (AA)
per CDR3-IMGT length (AA) with detailed clonotypes (nt)
The same information, as IMGT Clonotypes (AA) per Nb, is
provided with detailed clonotypes (nt) sorted by CDR3-IMGT length
(AA) and, under a same length, by CDR3-IMGT sequence (AA)
alphabetical order.
4.2.7 IMGT clonotypes (AA) by
CRD3-IMGT sequence (AA) alphabetical order with detailed
clonotypes (nt)
The same information, as IMGT clonotypes (AA) per Nb with
detailed clonotypes (nt), is provided sorted by CRD3-IMGT
sequence (AA) alphabetical order following a decreasing CDR3-IMGT
length. The format of the table is the same as in 6. IMGT
Clonotypes (AA) per CDR3-IMGT length (AA) with detailed
clonotypes (nt).
4.2.8 IMGT clonotype (AA) diversity and expression histograms: per
V, (D), J-GENE and per CDR3-IMGT length
- IMGT clonotype (AA) diversity histograms: Nb of IMGT clonotype
(AA) per V-GENE (green color), D-GENE (for IGH, TRB, TRD) (red
color) and J-GENE (yellow color) and per CDR3-IMGT length.
- IMGT clonotype (AA) expression histograms: Nb of sequences
assigned to an IMGT clonotype (AA) per V-GENE, D-GENE (for IGH,
TRB, TRD) and J-GENE (pink color). - IMGT clonotype (AA)
histograms per CDR3-IMGT length.
4.2.9 IMGT clonotype (AA) diversity and expression tables: per V,
(D), J-GENE and per CDR3-IMGT length
- IMGT clonotype (AA) diversity table: Nb of IMGT clonotypes (AA)
per V-GENE, D-GENE (for IGH, TRB, TRD), J-GENE. - IMGT
clonotype (AA) expression table: Nb of sequences assigned to an
IMGT clonotype (AA) per V-GENE, D-GENE (for IGH, TRB, TRD),
J-GENE. - IMGT clonotypes (AA) table per CDR3-IMGT
length.
4.2.10
V gene and allele table rearrangements
The table shows for each V gene name and V allele name:
Nb of sequences assigned to an IMGT clonotype (AA),
Nb of different IMGT clonotypes (AA),
Nb of out-of-frame sequences,
Nb of other categories sequences.
Clicking on the red and yellow squares, in the "V gene name" and
"V allele name" columns, gives access to the D and J genes and
alleles, respectively, involved in the rearrangements of a given V
gene or allele.
The same presntation, as 'IMGT clonotypes (AA) per CDR3-IMGT length
(AA)' is provided but sorted here by IMGT clonotypes (AA) present in a
single batch ("Present in MID1", lightblue and lightsteel blue lines)
and IMGT clonotypes (AA) common to 2 (or more) batches ("Present in
MID1 and MID2", lightpink and light yellow lines). The
information is paginated by page size and by different batch
combination and their common IMGT clonotypes (AA).
The Synthesis table indicates the number of IMGT clonotypes (AA)
(diversity) and the number of sequences assigned to IMGT clonotypes
(AA) (expression) only present ('exclusive') in a single batch or
common to two or more batches. There is one line for each
single batch (Nb of batches: 1) and for each combination of batches
(Nb of batches: 2 or >2).
5.3 IMGT
clonotypes (AA) comparison: Result summary table per V-GENE, D-GENE
(for IGH, TRB, TRD), J-GENE
The table recapitulates per gene of each batch:
'Nb of IMGT clonotypes (AA)'
'Nb of in-frame sequences assigned to IMGT clonotypes (AA)'.
The order of genes is the same as in the locus, with unmapped genes in
the first lines.
'data' directory
The 'data' directory contains the 'stats_xxx' (.txt format)
file(s) where 'xxx' is the batch name and the locus type.
At least two 'stats_xxx' files are needed to launch a comparative
analysis in IMGT/StatClonotype
tool.
This file contains 26 columns (see Table
Content of the stats_xxx file below).
This work was granted access to the HPC resources
of CINES under the allocation 2014-036029 made by GENCI (Grand
Equipement National de Calcul Intensif).
References:
[1]
Alamyar, E. et al., Immunome Res. 8:1:2. (2012)
doi:10.4172/1745-7580.1000056. LIGM:400
[2]
Alamyar, E. et al., IMGT/HighV-QUEST: A High-Throughput
System and Web Portal for the Analysis of Rearranged Nucleotide
Sequences of Antigen Receptors, JOBIM2010, Paper 63 (2010).
[3]
Alamyar E., et al., Methods Mol. Biol. 882:569-604
(2012). PMID:22665256 LIGM:404
[4]
Li. S et al., Nat. Commun. 4:2333(2013) doi:
10.1038/ncomms3333 Open
access. PMID:23995877
LIGM:419