Massachusetts Insitute for Technology
6.895 Advanced Distributed Algorithms, Fall Term 2002
Professor Nancy Lynch
Course Homepage
Taught by Prof: Nancy Lynch, NE43-365, 3-7225, lynch@theory.lcs.mit.edu
Course assistant: Joanne Talbot Hanley, NE43-366, 3-6054
Students mailing list: 6895-students@mit.edu
Course schedule (.ps)
Course Goals and Summary:
This course will explore the recent theoretical literature on distributed algorithms, emphasizing algorithms for highly ``dynamic'' settings such as the Internet or mobile computing systems. In such environments, processes and channels may fail and recover, and may exhibit variations in their timing behavior. Participants in the algorithms may join and leave the system (as well as fail) during the course of computation. Algorithms must adapt to such changes, for example, reconfiguring on-the-fly.
The traditional theory of distributed algorithms deals mostly with fairly ``static'' settings, e.g., fixed networks or shared memory systems with a known set of participants. This theory must be ``stretched'' to cover the newer, more dynamic settings. New algorithms are needed to cope with the new possibilities. New lower bounds and other impossibility results will arise, based on new uncertainties. All of this work will require new models, and new metrics for evaluating performance and fault-tolerance.
The papers to be covered in the course represent the current state-of-the-art in the development of such a theory. Clearly, much more remains to be done. When we read the papers, we will consider what role their ideas can play in this theory, and what new work they suggest. This should help interested students to find particular questions that they can pursue for independent research.
Problems of special interest will include:
(1) Communication, including various kinds of multicast (e.g., causal,
totally-ordered, atomic), gossiping protocols, and view-oriented group communication.
(2) Distributed data management, including weak and strong consistency
guarantees, replicated data management, and quorum-based computing. Other
problems we will consider will include failure detection, reaching consensus,
resource allocation, performing distributed work, clock synchronization, and
namespace management.
Since this will be a theory course, we will focus on algorithms for which it is possible to make definite, precise claims about what they do. These claims will include correctness, performance, and fault-tolerance results. We will also consider impossibility results, especially those that express inherent limitations on what can be accomplished in dynamic settings. Some not-so-theoretical papers with interesting systems ideas will also be covered; for these, we will discuss possibilities for future related theoretical work.