Additional course websites:

Course description

This courses introduces foundations and state-of-the-art machine learning challenges in genomics and the life sciences more broadly. We introduce both deep learning and classical machine learning approaches to key problems, comparing and contrasting their power and limitations. We seek to enable students to evaluate a wide variety of solutions to key problems we face in this rapidly developing field, and to execute on new enabling solutions that can have large impact. As part of the subject students will implement solutions to challenging problems, first in problem sets that span a carefully chosen set of tasks, and then in an independent project. Students will program using Python 3 and TensorFlow 2 in Jupyter Notebooks, a nod to the importance of carefully documenting your work so it can be precisely reproduced by others.

Syllabus and schedule

 When                  Where    DescriptionCourse materialsReference
Lecture 1 Feb 16 1pm Course Intro + Overview Foundations
Lecture 2 Feb 18 1pm ML Foundations
Recitation 1 Feb 19 3pm ML Review
Lecture 3 Feb 23 1pm Convolutional Neural Networks
Lecture 4 Feb 25 1pm Recurrent Neural Networks, Graph Neural Networks
Recitation 2 Feb 26 3pm Neural Networks Review
Lecture 5 Mar 2 1pm Interpretability, Dimensionality Reduction
Lecture 6 Mar 4 1pm Generative Models, GANs, VAE
Recitation 3 Mar 5 3pm Interpreting ML Models
No class Mar 9 Monday Class Schedule
Deadline Mar 10 11:59pm PS1 due
Lecture 7 Mar 11 1pm DNA Accessibility, Promoters and Enhancers
Recitation 4 Mar 12 3pm Chromatin and gene regulation
Lecture 8 Mar 16 1pm Transcription Factors, DNA methylation
Lecture 9 Mar 18 1pm Gene Expression, Splicing
Recitation 5 Mar 19 3pm RNA-seq, Splicing
No class Mar 23 Class Holiday
Lecture 10 Mar 25 1pm Single cell RNA-sequencing
Recitation 6 Mar 26 3pm scRNA-seq, dimensionality reduction
Lecture 11 Mar 30 1pm Dimensionality Reduction, Genetics, and Variation
Lecture 12 Apr 1 1pm GWAS and Rare variants
Deadline Apr 1 11:59pm PS2 due
Recitation 7 Apr 2 3pm Genetics
Lecture 13 Apr 6 1pm eQTLs
Lecture 14 Apr 8 1pm Electronic health records and patient data
Recitation 8 Apr 9 3pm ML for health data
Lecture 15 Apr 13 1pm Graph analysis
Lecture 16 Apr 15 1pm Drug discovery
Recitation 9 Apr 16 3pm Protein structure prediction
No class Apr 20 Class Holiday
Lecture 17 Apr 22 1pm Protein folding
Deadline Apr 23 11:59pm PS3 due
Recitation 10 Apr 23 3pm Exam prep session
Exam Apr 27 11:59pm In-class exam
Lecture 19 Apr 29 1pm No lecture
Deadline Apr 29 11:59pm PS4 due
Recitation 11 Apr 30 3pm Structural biology and protein folding
Lecture 20 May 4 1pm Imaging applications in healthcare
Lecture 21 May 6 1pm Video processing, structure determination
No class May 7 Class Holiday
Lecture 22 May 11 1pm Imaging and Cancer
Lecture 23 May 13 1pm EHRs and data mining
Recitation 12 May 14 3pm How to present
Deadline May 17 11:59pm Final project reports due
Lecture 24 May 18 1pm Neuroscience
Deadline May 19 11:59pm Final presentations due
Deadline May 20 In-class final presentations

Tutorials for TensorFlow, NumPy, Google Cloud, and Jupyter notebooks

We collected a series of pointers to tutorials on NumPy, TensorFlow, Google Cloud and Conda here. We also provide a Quickstart tutorial to set up essential environment and tools for you to work on problem set 0 and problem set 1.

Prerequisites

You should be comfortable with calculus, linear algebra, (Python) programming, probability, and introductory molecular biology. This will be a fast paced course, and it is targeted towards students that are both mathematically and computational capable. There are many other subjects at MIT that teach overviews of computational biology that are less demanding, we would be happy to recommend other options if you find this subject is not what you desire.

Class meeting times

  • Lecture: TR1-2.30
  • Recitation: F3-4
  • Mentoring Session: F4-5

Contact

You should feel free to contact the lecturer and the TAs about any questions through 6.874staff@mit.edu. The best way to get detailed questions answered is to attend TA office hours and recitation or post them on Piazza.

Office hours

Manolis Kellis (manoli@mit.edu): M 5-6pm
Zheng Dai, Dylan Cable: Tues 4-5pm
Jackie Valeri, Tessa Gustafson: Wed 7-8pm

Grading

Grading will be based upon five programming-intensive problem sets (30%), a quiz (25%), a project (35%), and participation plus one day of lecture scribing (10%). Attendance in lecture is important as the class moves quickly and you will need to be present. For students enrolled in one of the graduate versions of this class (6.874, 20.490, and HST.506) there will be an extra section on some problem sets. You can use three late days for problem set deadlines (or email the course staff).

Lecture Scribing

If you are enrolled in this course for credit, you are requiured to scribe for one lecture.

The requirements for lecture scribing are as follows:

  1. On the day of lecture you may take notes however you like. Lectures will be recorded, so asynchronous participation is fine.
  2. During the week after lecture, we ask that you work with everyone assigned to scribe your lecture to compile a finalized set of notes that summarize the key points of the lecture, explain important equations, images and plots, illustrate or describe relevant things that were written on the board, and describe any important questions & answers between student and professor that were exchanged.
    The end goal is for you to generate a compact resource which you and your classmates can use to glean the important material from your lecture. The finalized notes should generally adhere to and extend from the structure outlined by the headings at the beginning of the notes template.
  3. The notes template and finished scribed notes may be found here.
  4. Let the course staff know you are finished compiling the notes by sending an email to 6.874staff@mit.edu. The deadline for completing the notes will be end-of-day one week after your lecture (e.g. notes from a lecture on 2/18 will be due on 2/25 @ 11:59 PM).

Project

This subject has a substantial project component. We strongly recommend working on projects in team of 2-3 students, but if there’s a strong justification, we can consider exceptions. You are free to choose any problem in the life sciences related to the lectures of the course, and develop a deep learning solution using the subject’s methodologies or cloud resources. We will have extensive mentoring resources for the students to help provide guidance, access to datasets, and biological insights. We will hold mentoring sessions during which you will have a chance to refine your ideas in consultation with the teaching staff and research mentors for each research area.

Textbook

We will be using the book “Deep Learning” by Goodfellow, Bengio, and Courville. You can find the book online here and here. You can purchase a hard copy at MIT Press or on Amazon.

Another useful book is the Matrix Cookbook, an extensive collection of facts about matrices.