L519:  Lab Session 1 (9/2/05)

 

Today's Topics :
    1. Introduction to L519 Lab Sessions
    2. Unix commands
    3. NCBI PUBMED / GenBank

    4. Simple PERL Script to convert gbk to FASTA

    5. Homework#1 (including Group-project)

 

1. Introduction to L519 Lab Sessions
     
* L-519 Lab sessions will cover many practical bioinformatics softwares. These includes

Program

Function

Availability on BIOKDD

Web

Remarks

BLAST

Sequence search

O

O

FASTA

Sequence search

O

O

ClustalW

Sequence alignment

O

O

T-coffee

Sequence alignment

O

O

Phred/Phrap

DNA sequencing

O

O

GenScan

Gene prediction

O

O

GeneMark

Gene prediction

X

O

MEME

Motif finding

O

O

Gibbs sampler

Motif finding

O

O

Mfold

RNA structure prediction

X

O

RSEARCH

RNA structure search

X

O

SAM

Significance analysis of Microarray

X

O

R package

Predator

Protein secondary structure prediction

X

O

Threader

Protein structure prediction

X

O

DALI

Comparing protein 3D strcutures

X

O

Rasmol

Structure viewer

X

O

Windows


  * Group project presentation.
 


2. Unix commands

     A lot of bioinformatics softwares are running on Unix/Linux operating systems rather than MS windows, especially when you deal with a hugh amount of data. So you must be familiar with basic commands of Unix. (Note. If you are already used to MS-DOS commands, you will see that many of Unix commands are very similar to DOS commands since DOS commands are derived from Unix commands.) Please use the following references for Unix commands.
     A.  Very basic Unix commands
           Directory                   : ls, dir, cd, pwd, mkdir, rmdir 
           File Handling           : mv, cp, rm, chmod
           File Content              : cat, more, less
           Editor                         : vi, emacs, pico
           Archive, compress  : gzip, gunzip, compress, uncompress, tar
           People                        :  who, whoami, finger
           Location                    : find, which, whereis
           Internet                     : telnet, rlogin, ssh
           File Transfer            : ftp, sftp
           Manual                     : man
     B. Your text book by David Mount. Chapter 12 Appendix.  Table
     C. Web Ref 1.   http://www.computerhope.com/unix.htm
     D. Web Ref 2.  http://doors.stanford.edu/~sr/computing/basic-unix.html
     E. Web Ref 3.
  http://www.emba.uvm.edu/CF/basic.html 
     F. Use Google
     G. Put the following line into '.bashrc' file
          export PATH=$PATH:/usr/local/bio/bin:



3. NCBI
    NCBI (National Center for Biolotechnology Information) is a division of  the National Library of Medicine (NLM)  at the National Institute of Health (NIH).  Its programs and activities includes
    Basic Research
    Databases and Software
       * GenBank DNA sequence database : EMBL(European Molecular Biology Lab), DDBJ(DNA Data Bank in Japan)
       * OMIM, MMDB, UniGene, Gene Map of the Human Genome etc
       * Entrez : NCBI's search and retrieval system
       * PubMed : Journal literature databases providing web search interface providing access to over 11 million journal citations in MEDLINE

       
* BLAST : Sequence similarity searching program at NCBI

 

4. PubMed / GenBank
     A. PubMed: Public version of MEDLINE + other life science journals
           Currently
11 million citation records from 7300 journals dating from 1965 to present.
           Updated weekly.
     B. Try to find two papers published in 2001 regarding the completion of human genome sequencing (draft)
     C.
NM_000546
: TP53 Human protein
     D.
NCBI Resource guide

     E. NCBI Field guide (PDF)
          YOU MUST SEE  Slide1,  Slide2

 

5. Conversion of gbk sequence file to FASTA file
    A. Introduction to GenBank File Format 'gbk' (Web, flatfile)
    B.
Sample record and Field description
    
C.
FASTA format description
    D. Genbank access via FTP :
ftp.ncbi.nih.gov
    . Conversion from 'gbk' to 'FASTA'
          1) On NCBI web : Selecte different display type
                                         : Suitable for several number of sequences
          2) Batch Entrez  : Up to 5000 records
          3) BioPerl, CPAN
          4) Simple Perl Script :
Script1 , Script2 
                                                    InputOutput
   


6. HW1 Mini-project group


Last Modified : September 2, 2005

Maintained by : Junguk Hur ()