Additional course websites:
This courses introduces foundations and state-of-the-art machine learning challenges in genomics and the life sciences more broadly. We introduce both deep learning and classical machine learning approaches to key problems, comparing and contrasting their power and limitations. We seek to enable students to evaluate a wide variety of solutions to key problems we face in this rapidly developing field, and to execute on new enabling solutions that can have large impact. As part of the subject students will implement solutions to challenging problems, first in problem sets that span a carefully chosen set of tasks, and then in an independent project. Students will program using Python 3 and TensorFlow 2 in Jupyter Notebooks, a nod to the importance of carefully documenting your work so it can be precisely reproduced by others.
Syllabus and schedule
|Lecture 1||Feb 16 1pm||Course Intro + Overview Foundations|
|Lecture 2||Feb 18 1pm||ML Foundations|
|Recitation 1||Feb 19 3pm||ML Review|
|Lecture 3||Feb 23 1pm||Convolutional Neural Networks|
|Lecture 4||Feb 25 1pm||Recurrent Neural Networks, Graph Neural Networks|
|Recitation 2||Feb 26 3pm||Neural Networks Review|
|Lecture 5||Mar 2 1pm||Interpretability, Dimensionality Reduction||
|Lecture 6||Mar 4 1pm||Generative Models, GANs, VAE|
|Recitation 3||Mar 5 3pm||Interpreting ML Models|
|No class||Mar 9||Monday Class Schedule|
|Deadline||Mar 10 11:59pm||PS1 due|
|Lecture 7||Mar 11 1pm||DNA Accessibility, Promoters and Enhancers|
|Recitation 4||Mar 12 3pm||Chromatin and gene regulation|
|Lecture 8||Mar 16 1pm||Transcription Factors, DNA methylation|
|Lecture 9||Mar 18 1pm||Gene Expression, Splicing|
|Recitation 5||Mar 19 3pm||RNA-seq, Splicing|
|No class||Mar 23||Class Holiday|
|Lecture 10||Mar 25 1pm||Single cell RNA-sequencing|
|Recitation 6||Mar 26 3pm||scRNA-seq, dimensionality reduction|
|Lecture 11||Mar 30 1pm||Dimensionality Reduction, Genetics, and Variation|
|Lecture 12||Apr 1 1pm||GWAS and Rare variants|
|Deadline||Apr 1 11:59pm||PS2 due|
|Recitation 7||Apr 2 3pm||Genetics|
|Lecture 13||Apr 6 1pm||eQTLs|
|Lecture 14||Apr 8 1pm||Electronic health records and patient data|
|Recitation 8||Apr 9 3pm||ML for health data|
|Lecture 15||Apr 13 1pm||Graph analysis|
|Lecture 16||Apr 15 1pm||Drug discovery|
|Recitation 9||Apr 16 3pm||Protein structure prediction|
|No class||Apr 20||Class Holiday|
|Lecture 17||Apr 22 1pm||Protein folding|
|Deadline||Apr 23 11:59pm||PS3 due|
|Recitation 10||Apr 23 3pm||Exam prep session|
|Exam||Apr 27 11:59pm||In-class exam|
|Lecture 19||Apr 29 1pm||No lecture|
|Deadline||Apr 29 11:59pm||PS4 due|
|Recitation 11||Apr 30 3pm||Structural biology and protein folding|
|Lecture 20||May 4 1pm||Imaging applications in healthcare|
|Lecture 21||May 6 1pm||Video processing, structure determination|
|No class||May 7||Class Holiday|
|Lecture 22||May 11 1pm||Imaging and Cancer|
|Lecture 23||May 13 1pm||EHRs and data mining|
|Recitation 12||May 14 3pm||How to present|
|Deadline||May 17 11:59pm||Final project reports due|
|Lecture 24||May 18 1pm||Neuroscience|
|Deadline||May 19 11:59pm||Final presentations due|
|Deadline||May 20||In-class final presentations|
Tutorials for TensorFlow, NumPy, Google Cloud, and Jupyter notebooks
We collected a series of pointers to tutorials on NumPy, TensorFlow, Google Cloud and Conda here. We also provide a Quickstart tutorial to set up essential environment and tools for you to work on problem set 0 and problem set 1.
You should be comfortable with calculus, linear algebra, (Python) programming, probability, and introductory molecular biology. This will be a fast paced course, and it is targeted towards students that are both mathematically and computational capable. There are many other subjects at MIT that teach overviews of computational biology that are less demanding, we would be happy to recommend other options if you find this subject is not what you desire.
Class meeting times
- Lecture: TR1-2.30
- Recitation: F3-4
- Mentoring Session: F4-5
You should feel free to contact the lecturer and the TAs about any questions through firstname.lastname@example.org. The best way to get detailed questions answered is to attend TA office hours and recitation or post them on Piazza.
Grading will be based upon five programming-intensive problem sets (30%), a quiz (25%), a project (35%), and participation plus one day of lecture scribing (10%). Attendance in lecture is important as the class moves quickly and you will need to be present. For students enrolled in one of the graduate versions of this class (6.874, 20.490, and HST.506) there will be an extra section on some problem sets. You can use three late days for problem set deadlines (or email the course staff).
If you are enrolled in this course for credit, you are requiured to scribe for one lecture.
The requirements for lecture scribing are as follows:
- On the day of lecture you may take notes however you like. Lectures will be recorded, so asynchronous participation is fine.
During the week after lecture, we ask that you work with everyone assigned to scribe your lecture to compile a finalized set of notes that summarize the key points of the lecture, explain important equations, images and plots, illustrate or describe relevant things that were written on the board, and describe any important questions & answers between student and professor that were exchanged.
The end goal is for you to generate a compact resource which you and your classmates can use to glean the important material from your lecture. The finalized notes should generally adhere to and extend from the structure outlined by the headings at the beginning of the notes template.
- The notes template and finished scribed notes may be found here.
- Let the course staff know you are finished compiling the notes by sending an email to email@example.com. The deadline for completing the notes will be end-of-day one week after your lecture (e.g. notes from a lecture on 2/18 will be due on 2/25 @ 11:59 PM).
This subject has a substantial project component. We strongly recommend working on projects in team of 2-3 students, but if there’s a strong justification, we can consider exceptions. You are free to choose any problem in the life sciences related to the lectures of the course, and develop a deep learning solution using the subject’s methodologies or cloud resources. We will have extensive mentoring resources for the students to help provide guidance, access to datasets, and biological insights. We will hold mentoring sessions during which you will have a chance to refine your ideas in consultation with the teaching staff and research mentors for each research area.
Another useful book is the Matrix Cookbook, an extensive collection of facts about matrices.