22C:166 Distributed Systems and Algorithms

(Fall 2011) 1:30-2:20 MWF, Room 205 MLH.


Instructor: Sukumar Ghosh
201P MLH, ghosh@cs.uiowa.edu, 319-335-0738
Office Hours: 2:30-3:30 PM MWF

Textbook, Prerequisites, Handouts, Homework and Exams, Lecture Notes, On-line resources,


Textbook


Sukumar Ghosh: Distributed Systems: An Algorithmic Approach, 2006 CRC Press (ISBN 158488564) Table of contents


In addition to the textbook, we may use the following books as references:

1. Jean Dollimore, Tim Kindberg, George Coulouris , "Distributed Systems: Concepts and Design," Addison-Wesley 2005
2. Andrew Tannenbaum, Maarten van Steen , "Distributed Systems: Principles and Paradigms," Prentice Hall (2nd edition) 2006.
3. Nancy Lynch , "Distributed Algorithms," Morgan Kaufmann 1996.

The reference books will be available at the Engineering Library.

Prerequisites

Some knowledge of Operating Systems and/or Networking, Algorithms, and interest in Distributed Computing. This is not a programming course. Our goal is to learn and analyze why and how distributed systems work, why some of them fail, and how to tolerate failures and various dynamic behaviors.

Teaching Assistant

None.

Special course assistant : Andrew Berns, andrew-berns@uiowa.edu, 201N MLH
Available on Wednesdays 3:30-4:40 PM

Course Objective

A distributed system is a network of processes collectively performing a meaningful job or providing a service to the users. Such systems are being increasing relevant to our lives and our society. How processes communicate and interact with one another, how to guarantee correctness and build tolerances to various kinds of failures or dynamic behaviors, how to design distributed algorithms for specific problems, manage replicas and provide group communication services are up for discussion. This course will deal with the theory and algorithms related to distributed systems, and not programming aspects.

Course Outline

Course policies are governed by the College of Liberal Arts.

Handouts

Homework and Exams

Four homeworks and a quiz worth 30% of the final grade. One midterm exam and a final worth 70% of the grade. The midterm examination will be scheduled in the evening.

Tentative scale for letter grades
A+ = 95-100     B+ = 80-84     C+ = 65-69     D+ = 50-54    F = 0-39
A  = 90-94      B  = 75-79     C  = 60-64     D  = 45-49
A- = 85-89      B- = 70-74     C- = 55-59     D- = 40-44

The instructor reserves the right to make minor modifications in the above grading scale.

Lecture Notes

Week 1 notes (8/22-8/26)
(Chapters 1, 3)
Introductory topics -- interprocess communication models.
Week 2 notes (8/29-9/02)
(Chapters 4, 5)
Important properties of distributed computation -- correctness criteria and proof techniques.
Week 3 notes (9/05-9/09)
(Chapter 6)
Time in a distributed system -- Physical clocks, logical clocks, sequential and concurrent events.
Read papers 1 and 5 from the on-line resources. These are the original sources.
Week 4 notes (9/12-9/16)
(Chapters 7, 8)
Mutual exclusion -- Introduction to Distributed Snapshot
Read paper 3 from the online resources.
Week 5 notes (9/19-9/23)
(Chapters 8, 9)
Distributed Snapshot -- Global State Collection
Week 6 notes (9/26-9/30)
(Chapter 10)
Introduction to Distributed Graph Algorithms
Week 7 notes (10/3-10/7)
(Chapters 10, 11)
Distributed Graph Algorithms -- Coordination Algorithms
Week 8 notes (10/10-10/14)
(Chapters 12)
Introduction to Faults and Faut-tolerance
Also read paper number 4 (Arora and Gouda) from the On-line resources
Week 9 notes (10/17-10/21)
(Chapters 13)
Distributed Consensus
Week 10 notes (10/24-10/28)
(Chapters 13)
Distributed Consensus (continued): Byzantine Generals Problem and Failure Detectors
Read paper 2 (Lamport, Shostak and Pease) from the on-line resources
Week 11 notes (10/31-10/4)
(Chapters 15)
Group Communication
Week 12 notes (11/7-11/11)
(Chapters 16)
Replication
Week 13 notes (11/14-11/18)
(Chapters 17)
Self-stabilization
Week 15 notes (11/28-12/02)
(Chapters 21)
Peer-to-Peer Networks
Read papers 10 and 11.
Week 16 notes (12/05-12/09)
(Chapters 21)
Small-world Networks
Proof of Kleinberg's small-world theorem
Read paper 12

On-line resources

1. Leslie Lamport: Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM 21(7): 558-565 (1978) (pdf)

2. Leslie Lamport, Robert E. Shostak, Marshall C. Pease: The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4(3): 382-401 (1982) (pdf)

3. K. Mani Chandy, Leslie Lamport: Distributed Snapshots: Determining Global States of Distributed Systems ACM Trans. Comput. Syst. 3(1): 63-75 (1985) (pdf)

4. Anish Arora, Mohamed Gouda: Closure and Convergence: A Foundation of Fault-Tolerant Computing. IEEE Trans. Software Eng 19 (11) pp. 1015-1027, November 1993. (pdf)

5. Leslie Lamport, P.M. Melliar-Smith: Byzantine Clock Synchronization. ACM PODC 1984. (pdf)

6. Jeffrey Dean and Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Communications of the ACM 51 (1), January 2008. (pdf)

7. Tushar Deepak Chandra and Sam Toueg: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM, 43:2, March 1996, 225-267. (pdf)

8. Edsger W. Dijkstra: Self-stabilizing Systems in Spite of Distributed Control. Commun. ACM 17(11): 643-644 (1974). (pdf)

9. Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson: Impossibility of Distributed Consensus with One Faulty Process.Vol. 32, No. 2, pp. 374-382 (1985) (pdf)

10. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11(1): 17-32 (2003) (pdf)

11. James Aspnes, Gauri Shah: Skip graphs. SODA 2003: 384-393 (pdf)

12. Watts, D.J.; Strogatz, S.H. (1998). "Collective dynamics of 'small-world' networks.". Nature 393 (6684)

13. Jon Kleinberg. The small-world phenomenon: an algorithm perspective (2000) (pdf)