L519: Lab Session 4 (9/23/05)
Today's Topics
1. BLAST - PSI-BLAST
2. FASTA Program
3. PROSITE
4. Multiple sequence alignment programs (ClustalW, T-Coffee)
0. Quick quiz for BLAST
A. I have two files at /tmp/L519FALL2005/BLAST directory
* query.fasta
* target_sequences.fasta
B. Copy these files to your L519FALL2005/BLAST directory
C. Use BLAST program and find out which protein this small fragment (query.fasta) belongs to.
D. Refer to the previous lab session.
1. PSI-BLAST (Position Specific Iterated BLAST)
A. NCBI Web Site
http://www.ncbi.nlm.nih.gov/BLAST/
Try the 'query.fasta' from the above quiz
==> 1st Round PSI-BLAST result (Same as BLASTP)
==> 1st Round PSI-BLAST result (Using new PSSM)
B. Local PSI-BLAST
'blastpgp' do the PSI-BLAST instead of 'blastall'. Manual of blastpgp
1) Copy /tmp/L519FALL2005/BLAST/AE001439.faa to your BLAST directory and formatdb it.
>formatdb -i AE001439.faa -n Pylori -p T
2) PSI-BLAST on the same target database
>blastpgp -i proteinSeq1.txt -d EColi -j 2 (-j : iteration round)
>blastpgp -i proteinSeq1.txt -d EColi -j 2 -Q protein1Matrix.pssm (-Q Save PSSM into a file)
3) PSI-BLAST on different target database
We use -C and -R flags to provide a "checkpointing" facility whereby
a score model can be stored and later reused.
* Create a checkingpoint
>blastpgp -j 2 -C protein1.check -i proteinSeq1.txt -d EColi
* Reusing a checkingpoint
>blastpgp -R protein1.check -i proteinSeq1.txt -d Pylori
2. FASTA Program
A. Go to the Lab Session #3
3. PROSITE (http://ca.expasy.org/prosite/)
Database of protein families and domains
Sample : Kringle domain signature and profile (http://ca.expasy.org/cgi-bin/nicedoc.pl?PDOC00020)
Multiple Sequence Alignment Programs L519: Lab Session 5 (9/30/05)
0. Test Sequence Set1 [Download SeqSet1.fas]
Click in the text box and "Ctrl+A" which will select all sequences. Then "Ctrl+C" to copy into clibboard
0. Test Sequence Set2 [Download SeqSet2.fas]
Click in the text box and "Ctrl+A" which will select all sequences. Then "Ctrl+C" to copy into clibboard
1. ClustalW
A. Introduction to ClustalW
* 'multiple-sequence.ppt' : This PPT file contains overal introduction to multiple-sequence alignment.
Please see from page 35 for ClustalW.
B. EBI ClustalW web page
1) Adjustable options
2) Direct sequence input or file upload
3) Refer to ClustalW help / FAQ on the left column
4) Input formats
NBRF/PIR, EMBL/UniProt, Pearson (Fasta), GDE,
ALN/ClustalW, GCG/MSF, RSF (see the Clustal help pages for details about
formats)
5) Consensus symbols
An alignment will display by default the following symbols denoting the degree of conservation observed in each column:
"*" means that the residues or nucleotides in that column are identical in all sequences in the alignment.
":" means that conserved substitutions have been observed, according to the COLOUR table.
"." means that semi-conserved substitutions are observed.
6) Try to align the test sequences with default options
a) ResultPage
C. ClustalW - Standalone version
1) Avaiable on biokdd server "/usr/local/biokdd/bin/clustalw" (version 1.81)
2) Menu mode ; simply execute clustalw, select input file and execute
3) Command line mode
>clustalw -infile=SeqSet1.fas -outfile=SeqSet1_ClustalW.aln
Resutl File : SeqSet1_ClustalW.aln, SeqSet1.dnd, SeqSet1.ph
4) Refer to the following web site for further options
>clustalw -options
5) TreeView Program
Download Install version from http://taxonomy.zoology.gla.ac.uk/rod/treeview.html or Here
or Use the following executable only file
6) ClustalX - Clustal with GUI
D. References
a) Higgins DG, Sharp PM. (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73:237-44 [PubMed]
b) Thompson JD, Higgins DG, Gibson TJ. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-80. [PubMed] [Full Paper]
c) Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-82 [PubMed] [Full Paper]
E. Download
ftp://ftp.ebi.ac.uk/pub/software/dos/clustalw/
ftp://ftp.ebi.ac.uk/pub/software/dos/clustalx/
ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/
GeneDoc : Alignment Viewer / Editor
2. T-Coffee
A. Introduction to T-Coffee : Visit T-Coffee homepage
Click the image to view in full size.
B. T-Coffee Server on the Web
1) Swiss EMBNet Node
2) Spanish EMBNet Node
3) Try to align the test sequences with default options
C. T-Coffee - Standalone version
1) Avaiable on biokdd server "/tmp/L519FALL2005/T-COFFEE/bin/t_coffee"
Or Download from T-Coffee Web site
2) Copy t_coffee your directory or add this PATH or set alias
>alias t_coffee="/tmp/L519FALL2005/T-COFFEE/bin/t_coffee"
3) Command line mode
>t_coffee -infile=TestSeqSet.txt -outfile=TestSeqSet_T-COFFEE_Standalone.aln
D. References
1) T-Coffee User Manual
3. Further readings
* Comparison between t-coffee vs clustalW
http://acer.gen.tcd.ie/mmm/tco.html
* http://www.bioalgorithms.info/presentations/Ch06_MultAlign.ppt
4. Sequence Format Conversion
1. Web Services
A. BCM Search Launcher Format Converter
B. Sequence Analysis WWW Tools
C. NCBI ReadSeq Sequence Conversion Service
2. Local Version
A. ReadSeq by Dr. Don Gilbert
5. Exercise
1 - retrieve the ER mannosidase protein sequence (gi # = 5579331) [or any sequence of interest]
2 - do a protein-protein BLAST against the SWISS-PROT database using this sequence as a query
3 - retrieve the top 9 hits in FASTA format and save into a file
4 - use ClustalW (either web or local) with default options ==> ClustalW1.aln
5 - change the gap penalties and check for differences in the alignment ==> ClustalW2.aln
6 - use T-Coffee to align the same sequences ==> TCoffee1.aln
7. - Compare three result files
Last Modified : September 30, 2005