L519: Lab Session 3 (9/16/05)
Today's Topics :
1. Sequence Alignment Algorithms
2. BLAST
3. FASTA
4. Group Project Presentation
1. Sequence Alignment Algorithms
A. Needleman-Wunch Global Alignment Algorithm
B. Smith-Waterman Local Alignment Algorithm
* Introduction to Alignment Methods (PPT)
* Comparison of two methods
* Try Junguk's simple sequence alignment webtool
* BLAST vs FASTA (pdf)
2. BLAST
A. BLAST Programs
We need 1. query sequence
2. target sequence databases
3. blast program according to the types of sequences
* Different types of BLAST programs
Program |
Description |
|
blastp |
Compares an amino acid query sequence against a protein sequence database. | |
blastn |
Compares a nucleotide query sequence against a nucleotide sequence database. | |
blastx |
Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence. | |
tblastn |
Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. | |
tblastx |
Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is computationally intensive. |
B. BLAST on the web
* Try the following sample sequences for BLAST (NCBI) on the web against "Swiss-Prot Database"..
proteinSeq1.txt
DNASeq1.txt
* Did you find out where the two sequences originated from? (Hits?)
protein_BLAST_result, DNA_BLAST_result
* Find more about BLAST at NCBI Education
* BLAST Statistical background
C. BLAST Standalone version at local computer
Standalone version is available from NCBI FTP site (ftp.ncbi.nih.gov/blast/executables/).
Latest version is available at /tmp/L519FALL2005/BLAST/
It's already installed on the 'Biokdd' server. In order to use BLAST standalone version flawlessly,
we need to do additional stuffs before we perform actual BLAST.
You should pay attention to E-value when using BLAST
*. Check whether BLAST is in your path
> which blastall
0. Let's first create temporary directory our BLAST practice.
a. Go to your home directory : >cd
b. Create L519FALL2005 directory : >mkdir L519FALL2005
c. Move into "" : >cd L519FALL2005
d. Create BLAST directory : >mkdir BLAST
1. You should have a '.ncbirc' in your home directory which contains the following.
[NCBI]
Data=/usr/local/bio/blast-data
* You can copy '.ncbirc' file from /tmp/L519FALL2005/ to your homedirectory
> cp /tmp/L519FALL2005/Lab/Lab3/.ncbirc ~/
2. Target sequences should be formatted before it's searched against.
a. Copy E.Coli protein sequences from the following directory
>cp /data1/genbank/genomes/Bacteria/Escherichia_coli_K12/U00096.faa ~juhur/L519FALL2005/BLAST
b. Now perform 'formatdb' in the BLAST directory
>formatdb -i U00096.faa -n EColi -p T
c. You will see these files created in the same directory.
EColi.pin, EColi.psq, EColi.phr, formatdb.log
3. Let's perform a simple BLAST of "proteinSeq1.txt"
a. Copy the "proteinSeq1.txt" into the BLAST directory.
Located at /tmp/L519FALL2005/Lab/Lab3
b. >blastall -p blastp -d EColi -i proteinSeq1.txt -o proteinSeq1.out
blastall -p blastp -d EColi -i proteinSeq1.txt
4. Change the following options
A. -e : expectation value (Default: 10)
B. -m : alignment view option (Default: 0)
C. -b : Number of databse sequences to show alignments (Default: 250)
D. -v : Number of database sequences to show one-line descriptor (Default: 500)
E. -g : Perform gapped alignment (Default: T)
F. -M : Scoring Matrix (Default: BLOSUM62)
5. In case of multiple sequences in query file?
Multiple sequences in a single FASTA file will be accepted by BLAST.
/tmp/L519FALL2005/Lab3/multipleSequence.fas
6. You may try other databases.
I have some formatdb BLAST db files at ~juhur/bioProg/BLAST/db/
a. RatRNARef - Rat RefSeq RNA dataset
b. RatProteinRef - Rat RefSeq Protein dataset
c. sprot - UniProt (Swiss-Prot)
7. There are many options you can adjust. Simply run blastall without any option.
Then you will see lots of available options. For more details, refer to the README documents for details.
Documents are at "/tmp/L519FALL2005/BLAST/doc/" on biokdd server or just follow this link.
8. Try to make BLAST print out result in html (with -T T)
>blastall -p blastp -d EColi -i proteinSeq1.txt -o /var/www/html/juhur/index.html -T T
3. FASTA
A. FASTA programs
FASTA is another heuristic program that perform sequence searches much faster than traditional S-W/ N-W algorithms.
We can use both web and local server.
* Different types of FASTA programs : FASTA programs on Virginia Tech
Avaialble programs from README file.
B. FASTA on the web
* FASTA at Virginia Tech
* FASTA at EBI
* Let's try the same sequences we used for FASTA against "Swiss-Prot Database".
proteinSeq1.txt
DNASeq1.txt
* Did you find out where the two sequences came from? (Hits?)
* How similar/different from BLAST?
C. FASTA Standalone version at local computer
Standalone version is available from Virginia Tech FTP site (ftp://ftp.virginia.edu/pub/fasta/).
It's already installed on the 'Biokdd' server.
* Unlike BLAST, we DON'T need to format target sequence sets.
* However, sequences being used should be in predefined formats. But don't worry these formats includes
FASTA, Swiss-Prot, GenBank, etc. (Refer to the manual for full list)
* Let's just keep using FASTA format. Since proteinSeq1.txt does not have 'commend or seq ID line starting '>' ',
Let's use proteinSeq2.txt (But it has exactly same sequence with proteinSeq1.txt)
* > fasta34_t -w 60 -a proteinSeq2.txt U00096.faa -q > proteinSeq2Fasta.out
* Refer to FASTA manual for full options and README for available programs.
D. Further readings
A. Similarity Searches on Sequence Databases: BLAST, FASTA (PDF)
B. Alignment Methods : Needleman-Wunch / Smith-Waterman (PDF)
C. Your text book and GOOGLE.
5. Group Project Presentation
http://darwin.informatics.indiana.edu/cgi-bin/col/courses/L519/Eval/Eval.cgi
Use your password sent to you by email
Last Modified : September 16, 2005