6.895 Advanced Distributed Algorithms/ Professor Nancy Lynch/ MIT

Massachusetts Insitute for Technology
6.895 Advanced Distributed Algorithms, Fall Term 2002
Professor Nancy Lynch

Course Homepage

Fall, 2002
Time: Tuesday and Thursday, 9:30-11:00am
Room: 36-155
Credits: 12 (3-0-9)
Level: Grad H

Enrollment and prerequisites: The course is intended for graduate students interested in doing research in the area of distributed algorithms. It assumes knowledge of the material in 6.852, the basic MIT distributed algorithms course. Formally, this means that you've either taken 6.852 or gotten special permission from Prof. Lynch to take 6.895 without 6.852. Students should have a strong background in mathematics, and should be familiar with basic algorithms (e.g., 6.046) and computer systems (e.g., 6.033).

Taught by Prof: Nancy Lynch, NE43-365, 3-7225, lynch@theory.lcs.mit.edu

Course assistant: Joanne Talbot Hanley, NE43-366, 3-6054

Students mailing list: 6895-students@mit.edu

Course schedule (.ps)

Handouts, papers, and notes

Course Goals and Summary:

This course will explore the recent theoretical literature on distributed algorithms, emphasizing algorithms for highly ``dynamic'' settings such as the Internet or mobile computing systems. In such environments, processes and channels may fail and recover, and may exhibit variations in their timing behavior. Participants in the algorithms may join and leave the system (as well as fail) during the course of computation. Algorithms must adapt to such changes, for example, reconfiguring on-the-fly.

The traditional theory of distributed algorithms deals mostly with fairly ``static'' settings, e.g., fixed networks or shared memory systems with a known set of participants. This theory must be ``stretched'' to cover the newer, more dynamic settings. New algorithms are needed to cope with the new possibilities. New lower bounds and other impossibility results will arise, based on new uncertainties. All of this work will require new models, and new metrics for evaluating performance and fault-tolerance.

The papers to be covered in the course represent the current state-of-the-art in the development of such a theory. Clearly, much more remains to be done. When we read the papers, we will consider what role their ideas can play in this theory, and what new work they suggest. This should help interested students to find particular questions that they can pursue for independent research.

Problems of special interest will include:
(1) Communication, including various kinds of multicast (e.g., causal, totally-ordered, atomic), gossiping protocols, and view-oriented group communication.
(2) Distributed data management, including weak and strong consistency guarantees, replicated data management, and quorum-based computing. Other problems we will consider will include failure detection, reaching consensus, resource allocation, performing distributed work, clock synchronization, and namespace management.

Since this will be a theory course, we will focus on algorithms for which it is possible to make definite, precise claims about what they do. These claims will include correctness, performance, and fault-tolerance results. We will also consider impossibility results, especially those that express inherent limitations on what can be accomplished in dynamic settings. Some not-so-theoretical papers with interesting systems ideas will also be covered; for these, we will discuss possibilities for future related theoretical work.