L519: Lab Session 7 (10/21/05)
Today's Topics
1. Gene Prediction Tools (GenScan, TwinScan)
2. GNUPlot
1. GenScan
A. Developed by Chris Burge (Currently at MIT)
B. Eukaryotic Gene Prediction.
C. Model : Statistical (Hidden Markov Model)
E. Web GenScan Service
http://genes.mit.edu/GENSCAN.html
1) Limit : One million base pairs (1Mbps) in length.
2) Three different types of organisms
Vertebrate, Arabidopsis, Maize
3) It predicts Genes/Exons
4) For sequences longer than 1Mbps, you should use email server or local standalone version.
F. Example
Let's use GenScan to predicted genes in the following 100Kbps Arabidopsis genomic sequence.
Arabidopsis genomic sequence
* Run GenScan. GenScan Output File HTML, PDF View
Q1) How many genes have you found in this piece of DNA?
Q2) How many exons does the predicted gene#10 have?
Q3) What protein corresponds to the predicted gene#14?
Another Example. Homo sapiens Chr#18 58,941,000 ~ 59,140,000
G. Standalone Version of GenScan at Biokdd
1) Located at
/usr/local/biokdd/bin/genscan
/home4/genbank/genomes/all-fnas/software/genscanlinux
2) Options for GenScan
usage: genscan parfname seqfname [-v] [-cds] [-subopt cutoff] [-ps psfname scale]
3) Check /tmp/L519FALL2005/Lab7 for required files.
4) Sample Usage
>genscan Arabidopsis.smat Arabidopsis.fas > Arabidopsis.out
2. TwinScan
A. TwinScan Service at Washington University.
http://genes.cs.wustl.edu/
B. TwinScan uses both HMM and similarity
(eg. between Human and Mouse)
C. Try a short Arabidopsis genomic sequence <SEQ>
1) TwinScan Result
2) GenScan Result
D. Use the above Arabidopsis genomic sequence to run TwinScan and compare its result with GenScan result.
1) TwinScan Result (PDF)
3. EST_GENOME
A. Similarity based gene prediction program developed by Wellcome Trust Center for Human Genetics
B. Webpage : http://www.well.ox.ac.uk/~rmott/ESTGENOME/est_genome.shtml
4. GENEMARK
A. Prokaryotic Gene Prediction Program
B. Webpage : http://opal.biology.gatech.edu/GeneMark/
Last Modified : October 21, 2005