Massachusetts Insitute of Technology
6.897 Principles of Fault-Tolerant Distributed Computing, Fall Term 2004
Professor Nancy Lynch
Dr. Gregory Chockler
Announcements
Course Homepage
Taught by:
Course assistant: Joanne Talbot Hanley, 32-G672A, 3-6054
Students mailing lists:
6897-students@theory,
6.897-students@theory
Schedule, handouts, papers, and notes
Course Goals and Summary:
Fault-tolerance is one of the most established but yet actively researched subject areas of distributed computing. The interest in the subject is motivated by an ever growing popularity of distributed systems where robustness represents a major concern due to the inherent vulnerability to component failures and malicious attacks. This course is aimed to introduce the students to the principles of fault-tolerance in distributed systems covering both current state of the art and providing a glimpse into the research frontiers.
To make the exposition self-contained we will start by reviewing fundamental concepts and classical results in distributed fault-tolerance. We will cover computation and failure models as well as basic algorithms and impossibility results. Wherever possible, the material will be presented from the modern perspective, giving the up-to-date outlook at the classical problems. The second part will be dealing with recent advances, current research and open problems in the field. The topics to be discussed will include (but will not necessarily be limited to) approaches to circumventing impossibility results, computing with unreliable storage and fault-tolerance in dynamic systems.
The course will combine lectures (the first part) with student presentations (the second part). The first part might be accompanied by a few theoretical exercises whose objective will be to reinforce the material studied in class. There'll be also a possibility for doing a practical project or working on an open problem.