L519: Lab Session 1 (9/2/05)
Today's Topics :
1. Introduction to L519 Lab Sessions
2. Unix commands
3. NCBI PUBMED / GenBank
4. Simple PERL Script to convert gbk to FASTA
5. Homework#1 (including Group-project)
1. Introduction to L519 Lab Sessions
* L-519 Lab sessions will cover many practical bioinformatics softwares. These includes
Program |
Function |
Availability on BIOKDD |
Web |
Remarks |
BLAST |
Sequence search |
O |
||
FASTA |
Sequence search |
O |
||
ClustalW |
Sequence alignment |
O |
||
T-coffee |
Sequence alignment |
O |
||
Phred/Phrap |
DNA sequencing |
O |
||
GenScan |
Gene prediction |
O |
||
GeneMark |
Gene prediction |
X |
||
MEME |
Motif finding |
O |
||
Gibbs sampler |
Motif finding |
O |
||
Mfold |
RNA structure prediction |
X |
||
RSEARCH |
RNA structure search |
X |
||
SAM |
Significance analysis of Microarray |
X |
R package |
|
Predator |
Protein secondary structure prediction |
X |
||
Threader |
Protein structure prediction |
X |
||
DALI |
Comparing protein 3D strcutures |
X |
|
|
Rasmol |
Structure viewer |
X |
Windows |
* Group project presentation.
2. Unix commands
A lot of bioinformatics softwares are running on Unix/Linux operating systems rather than MS windows, especially when you deal with a hugh amount of data. So you must be familiar with basic commands of Unix. (Note. If you are already used to MS-DOS commands, you will see that many of Unix commands are very similar to DOS commands since DOS commands are derived from Unix commands.) Please use the following references for Unix commands.
A. Very basic Unix commands
Directory : ls, dir, cd, pwd, mkdir, rmdir
File Handling : mv, cp, rm, chmod
File Content : cat, more, less
Editor : vi, emacs, pico
Archive, compress : gzip, gunzip, compress, uncompress, tar
People : who, whoami, finger
Location : find, which, whereis
Internet : telnet, rlogin, ssh
File Transfer : ftp, sftp
Manual : man
B. Your text book by David Mount. Chapter 12 Appendix. Table
C. Web Ref 1. http://www.computerhope.com/unix.htm
D. Web Ref 2. http://doors.stanford.edu/~sr/computing/basic-unix.html
E. Web Ref 3. http://www.emba.uvm.edu/CF/basic.html
F. Use Google
G. Put the following line into '.bashrc' file
export PATH=$PATH:/usr/local/bio/bin:
3. NCBI
NCBI (National Center for Biolotechnology Information) is a division of the National Library of Medicine (NLM) at the National Institute of Health (NIH). Its programs and activities includes
Basic Research
Databases and Software
* GenBank DNA sequence database : EMBL(European Molecular Biology Lab), DDBJ(DNA Data Bank in Japan)
* OMIM, MMDB, UniGene, Gene Map of the Human Genome etc
* Entrez : NCBI's search and retrieval system
* PubMed : Journal literature databases providing web search interface providing access to over 11 million journal citations in MEDLINE
* BLAST : Sequence similarity searching program at NCBI
4. PubMed / GenBank
A. PubMed: Public version of MEDLINE + other life science journals
Currently 11 million citation records from 7300 journals dating from 1965 to present.
Updated weekly.
B. Try to find two papers published in 2001 regarding the completion of human genome sequencing (draft)
C.
NM_000546 : TP53 Human protein
D. NCBI Resource guide
E. NCBI Field guide (PDF)
YOU MUST SEE ,
5. Conversion of gbk sequence file to FASTA file
A. Introduction to GenBank File Format 'gbk' (Web, flatfile)
B. Sample record and Field description
C. FASTA format description
D. Genbank access via FTP : ftp.ncbi.nih.gov
E . Conversion from 'gbk' to 'FASTA'
1) On NCBI web : Selecte different display type
: Suitable for several number of sequences
2) Batch Entrez : Up to 5000 records
3) BioPerl, CPAN
4) Simple Perl Script : Script1 , Script2
Input, Output
Last Modified : September 2, 2005
Maintained by : Junguk Hur ()