Analysis Next-Gen Sequencing Data (CMPB 5004 03)
Course Director: Luce Skrabanek, PhD
After completing this course, students will be able to:
- Have a deep appreciation of current DNA sequencing technologies, and an awareness of pitfalls, caveats, and confounding factors;
- Understand which technologies are appropriate for which use cases;
- Be aware of the details in deriving insights from raw data;
- Be able to critically assess next generation sequencing data and analyses, and be aware of common biases.
Next generation DNA sequencing technology has revolutionized our ability to ask almost any question of our genome, epigenome or transcriptome. In Part I of the course, we focus on the principles of the dominant technology: the Illumina short read sequencing by synthesis platform. The complete analysis pipeline is examined in detail, from the generation of raw reads, through alignment to the genome (Part II), and up to gene-centric analyses in Part III. At each step, there will be a strong emphasis on quality control, highlighting limitations and common pitfalls of the most commonly used tools, as well as ways to deal with them. In Part IV, alternate DNA sequencing technologies are surveyed, showcasing their applications.
Students will use the knowledge gained throughout this course to apply to a practical project which will focus on the analysis of one or more NGS data types to address a biomedically relevant question.
Course Requirements and Grading
70% of the grade will be assessed by an individual project, using techniques learned in class to explore a meaningful biological question. The project will be developed throughout the course, with opportunities every week to refine and get feedback. 30% of the grade will be assessed via weekly short programming exercises.
Time: Tuesdays and Thursdays 10am-11:30am.