I529: Bioinformatics in Molecular Biology and Genetics: Practical Applications (4CR)
Spring Semester 2007
Lecture : M/W 4-5:15pm, I107
Office Hour: TBA
Eigenmall 1008
Lab: Fri, 4-5:30pm I109
Instructor: Haixu Tang
AI: Huijun Wang
Description: We aim to introduce a broad
range of, from fundamantal and advanced, applications of
bioinformatics methods and tools to solving problems in genomics and molecular biology.
Prior to this class, the students should have learned basic methods and
theories in bioinformatics, e.g. by taking I519. In this class, we will focus on
how to apply them to solving biological problems in real life.
Some advanced computational techniques that are widely applied in bioinformatics,
e.g. Hidden Markov model (HMM), Bayesian Network (BN),
will be discussed in details in the class.
The important themes that will be covered by this course include
- Sequence modeling and classification
- Genome annotation
- Motif finding
- Genome comparison
- Protein families
- Non-coding RNAs
- MicroRNAs and their targets
- Functional prediction
- Phylogenetics
- Mass spectrometry and proteomics
This class will have a separate lab section, in which the students will be taught
in how to solve biological problems in a step-by-step fashion. The programs that will be covered
in the lab of this class include
- Sequence modeling using Markov chains: seq++;
- Pair HMM: SLAM, TwinScan, QRNA;
- HMM: Genscan;
- Profile HMM: Hmmer, Pfam;
- Stochastic Context Free Grammer (SCFG): COVE;
- Non-coding RNA search: Rsearch;
- Phylogenetics: PHYLIP, PAML;
Students will be instructed to write scripts (Perl and PHP preferrable) and/or programs that make use of the current implementation of sophisticated algorithms, such as HMM, BN, SVM, etc., to solve biological problems.
This course is designed for the advanced level bioinformatics graduate students after they take I519. Graduate students with either biology or phisical/computer science backgrounds who is interested in bioinformatics applications in molecular biology are also welcome to take this course.
Textbook: : Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison,
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
, Cambridge University Press, 1999, (BSA) BSA 1.1 - 1.2 1/26 Fri. 2/2 Fri. 2/9 Fri. HMM III: Parameter estimation 2/16 Fri. 2/23 Fri. 3/23 Fri. 3/30 Fri. 4/6 Fri. 4/20 Fri. Last updated : 12/25/2006
Some of the topics from the course can not be found in this book. We will
distribute complementary lecture notes and reading materials along the course
for these topics. We also recommend the students to read the book, Nello Cristianini
and Matthew W. Hahn,
Introduction to Computational Genomics:
A Case Studies Approach
,Cambridge University Press, 2006
Assignments: We will have 5 take-home
assignments and 1 class project.
Grading: Combined
assignments (30%), One mid-term exam (20%), Final exam (25%), Class Project (20%), Attendence
(5%).
Office hour: Haixu Tang: TBA,
Eigenmann 1008, or upon appointment
Office hour: Huijun Wang: TBA,
Ph.D student office, or upon appointment
Prerequisites:
I519 or equivalent knowledge in bioinformatics
required.
Group Assignment: The class will be divided into several small groups for mini projects.
The group assignment is going to be determined in the first class.
Final projects (please email me if you have any questions regarding these projects)
Preliminary syllabus [This may
change!]:
Week
Date
Contents
Lecture
notes
1
1/8 Mon.
Introduction to the
class
The primer of Perl
Hypertext Preprocessor PHP
-- we will use it for the web site design in this class.
1/10 Wed.
Probabilistic modeling
BSA 1.4, Chapter 11
Notes
1/12 Fri.
Lab1: Web site design using PHP and mySQL
(Homework 1)
2
1/15 Mon.
No class (Martin Luther King Jr. Day)
1/17 Wed.
Probabilistic sequence modeling I:
frequency and profiles
Notes
1/19 Fri.
Lab2: Alignment algorithms: Smith-Waterman, FASTA and Blast
3
1/22/ Mon.
Probabilistic sequence modeling I:
frequency and profiles
1/24 Wed.
Probabilistic sequencing modeling II: Markov chain
BSA Chapter 4
Notes
Lab3: Modeling biological sequences using seq++ ; blocks and related tools; Sequence weblogo
(Homework 1 due)
4
1/29 Mon.
Probabilistic sequencing modeling II: Markov chain
(Homework 2)
1/31 Wed.
Hidden Markov Model I: Model structure
BSA Chapter 3
Notes
Group Discussion
5
2/5 Mon.
Hidden Markov Model I: Model structure
2/7 Wed.
Hidden Markov Model II: GHMM
Lab4: GeneMark.HMM & Genscan
6
2/12 Mon.
(Homework 3)BSA Chapter 3
Notes
2/14 Wed.
HMM III: parameter estimation
Group discussion
(Homework 2 due)
7
2/19 Mon.
EM algorithm
Notes
2/21 Wed.
EM algorithm
Lab5: SLAM,TwinScan,QRNA
8
2/26 Mon.
Pair HMM I
BSA Chapter 4
Notes
2/28 Wed.
Pair HMM II
9
3/5 Mon.
Profile HMM I
(Homework 4)BSA Chapter 5
Notes
3/7 Wed.
Midterm
10
Spring access`
11
3/19 Mon.
Profile HMM II
BSA Chapter 5 3/21 Wed.
Profile HMM III
Lab5: Pfam & Hmmer
(Homework 3 due)
12
3/26 Mon.
Gibbs Sampling
Notes
3/28 Wed.
Advance probabilistic graphic models
Notes
Group Discussion
13
4/2 Mon.
Phylogenetics: distances and evolutionary models
Homework 5
BSA Chapter 7
Notes
4/4 Wed.
Phylogenetics: Neighbor joining (NJ) tree
BSA Chapter 7
Lab6: ClustalW, Phylip, Treeview/ATV
(Homework 4 due)
14
4/9 Mon.
Phylogenetics: parsomony
BSA Chapter 7
Notes
4/11 Wed.
Phylogenetics: bootstrap
BSA Chapter 7
4/13 Fri.
Lab7: Phylip: more examples
15
4/16 Mon.
Phylogenetics: phylogeny and alignment
BSA 7
4/18 Wed.
Phylogenetics: probabilistic models of evolution
BSA 9
Notes
Lab8: PAML
(Homework 5 due)
16
4/23 Mon.
Phylogenetics: Maximal likelihood (ML) method
BSA 9
Notes
4/25 Wed.
Project presentation (continue on 4/27, Friday)
17
4/30 Mon
Final Exam
5/4 Fri.
Final project report due