(Fall 2011) 1:30-2:20 MWF, Room 205 MLH.
201P MLH, email@example.com, 319-335-0738
Office Hours: 2:30-3:30 PM MWF
Textbook, Prerequisites, Handouts, Homework and Exams, Lecture Notes, On-line resources,
Sukumar Ghosh: Distributed Systems: An Algorithmic Approach, 2006
CRC Press (ISBN 158488564)
Table of contents
In addition to the textbook, we may use the following books as references:
1. Jean Dollimore, Tim Kindberg, George Coulouris , "Distributed Systems: Concepts and Design," Addison-Wesley 2005
2. Andrew Tannenbaum, Maarten van Steen , "Distributed Systems: Principles and Paradigms," Prentice Hall (2nd edition) 2006.
3. Nancy Lynch , "Distributed Algorithms," Morgan Kaufmann 1996.
The reference books will be available at the Engineering Library.
Some knowledge of Operating Systems and/or Networking, Algorithms, and interest in Distributed Computing. This is not a programming course. Our goal is to learn and analyze why and how distributed systems work, why some of them fail, and how to tolerate failures and various dynamic behaviors.
Special course assistant : Andrew Berns, firstname.lastname@example.org, 201N MLH
Available on Wednesdays 3:30-4:40 PM
A distributed system is a network of processes collectively performing a meaningful job or providing a service to the users. Such systems are being increasing relevant to our lives and our society. How processes communicate and interact with one another, how to guarantee correctness and build tolerances to various kinds of failures or dynamic behaviors, how to design distributed algorithms for specific problems, manage replicas and provide group communication services are up for discussion. This course will deal with the theory and algorithms related to distributed systems, and not programming aspects.
Course policies are governed by the College of Liberal Arts.
A+ = 95-100 B+ = 80-84 C+ = 65-69 D+ = 50-54 F = 0-39
A = 90-94 B = 75-79 C = 60-64 D = 45-49
A- = 85-89 B- = 70-74 C- = 55-59 D- = 40-44
The instructor reserves the right to make minor modifications in the above grading scale.
Week 1 notes (8/22-8/26) |
(Chapters 1, 3)
Introductory topics -- interprocess communication models.
Week 2 notes (8/29-9/02) |
(Chapters 4, 5)
Important properties of distributed computation -- correctness criteria and proof techniques.
Week 3 notes (9/05-9/09) |
Time in a distributed system -- Physical clocks, logical clocks, sequential and concurrent events.
Read papers 1 and 5 from the on-line resources. These are the original sources.
Week 4 notes (9/12-9/16) |
(Chapters 7, 8)
Mutual exclusion -- Introduction to Distributed Snapshot
Read paper 3 from the online resources.
Week 5 notes (9/19-9/23) |
(Chapters 8, 9)
Distributed Snapshot -- Global State Collection
Week 6 notes (9/26-9/30) |
Introduction to Distributed Graph Algorithms
Week 7 notes (10/3-10/7) |
(Chapters 10, 11)
Distributed Graph Algorithms -- Coordination Algorithms
Week 8 notes (10/10-10/14) |
Introduction to Faults and Faut-tolerance
Also read paper number 4 (Arora and Gouda) from the On-line resources
Week 9 notes (10/17-10/21) |
Week 10 notes (10/24-10/28) |
Distributed Consensus (continued): Byzantine Generals Problem and Failure Detectors
Read paper 2 (Lamport, Shostak and Pease) from the on-line resources
Week 11 notes (10/31-10/4) |
Week 12 notes (11/7-11/11) |
Week 13 notes (11/14-11/18) |
Week 15 notes (11/28-12/02) |
Read papers 10 and 11.
Week 16 notes (12/05-12/09) |
Proof of Kleinberg's small-world theorem
Read paper 12
Leslie Lamport, Robert E. Shostak, Marshall C. Pease: The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4(3): 382-401 (1982)
K. Mani Chandy, Leslie Lamport: Distributed Snapshots: Determining Global States of Distributed Systems ACM Trans. Comput. Syst. 3(1): 63-75 (1985)
Anish Arora, Mohamed Gouda: Closure and Convergence: A Foundation of Fault-Tolerant Computing. IEEE Trans. Software Eng 19 (11) pp. 1015-1027, November 1993.
Leslie Lamport, P.M. Melliar-Smith: Byzantine Clock Synchronization. ACM PODC 1984. (pdf)
Jeffrey Dean and Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Communications of the ACM 51 (1), January 2008.
7. Tushar Deepak Chandra and Sam Toueg: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM, 43:2, March 1996, 225-267. (pdf)
8. Edsger W. Dijkstra: Self-stabilizing Systems in Spite of Distributed Control. Commun. ACM 17(11): 643-644 (1974). (pdf)
9. Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson: Impossibility of Distributed Consensus with One Faulty Process.Vol. 32, No. 2, pp. 374-382 (1985) (pdf)
10. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11(1): 17-32 (2003) (pdf)
11. James Aspnes, Gauri Shah: Skip graphs. SODA 2003: 384-393 (pdf)
12. Watts, D.J.; Strogatz, S.H. (1998). "Collective dynamics of 'small-world' networks.". Nature 393 (6684)
13. Jon Kleinberg. The small-world phenomenon: an algorithm perspective (2000) (pdf)