Here you are: IMGT Web resources > IMGT Education > Questions and answers

From gene to protein analysis

  1. First find information about the protein in UniProtKB/Swiss-Prot (http://www.uniprot.org/).

    Check the status of the entry (TrEMBL or Swiss-Prot annotation) in "Entry history".

  2. In the the Swiss-Prot entry, check in "Cross-reference > Sequence database", if "DNA" or "mRNA" sequences are known.

    Note that in the generalist databases (EMBL, GenBank and DDBJ) "mRNA" means "cDNA" !
    Click on EMBL, GenBank or DDBJ to obtain the sequences.

  3. If the available sequences only correspond to cDNA, go to NCBI (http://www.ncbi.nlm.nih.gov/sites/) in order to find the genomic sequence (available for completely sequenced genomes, for example human, mouse...).
    • Search "All databases" with your protein (type the protein name in the query box), and click on "GO".
    • Then click on "UniSTS: markers and mapping data" (in the right column).
    • On the results page, you will get the information about the gene position on the chromosome, and the genomic sequence.

    In "Mapping Information", Map Viewer provides an outline of the gene position on the chromosome.

  4. Localize the introns and exons by aligning the DNA and cDNA sequences.

    You can use:

    • ALIGN
    • SIM 4

    access from IMGT "Tools for every day".

  5. Identify precisely the splicing sites (Splicing site in IMGT Aide-mémoire)

    Color the nucleotide and the amino acids according to the IMGT Color menu for splicing types (purple, green, blue for splicing frame 1, 2 and 0 respectively).

  6. Translate the exons.

    You can use:

    • Translate

    access from IMGT "Tools for every day".

  7. Join the exon nucleotide sequences, then align against the cDNA sequence for checking.

    Look for eventual differences (allelic polymorphism, splicing differences, possible errors) and refine the splicing sites.