22C:166 Distributed Systems and Algorithms

(Fall 2011) 1:30-2:20 MWF, Room 205 MLH.

Instructor: Sukumar Ghosh
201P MLH, ghosh@cs.uiowa.edu, 319-335-0738
Office Hours: 2:30-3:30 PM MWF

Textbook, Prerequisites, Handouts, Homework and Exams, Lecture Notes, On-line resources,

Textbook

Sukumar Ghosh: Distributed Systems: An Algorithmic Approach, 2006 CRC Press (ISBN 158488564) Table of contents

In addition to the textbook, we may use the following books as references:

1. Jean Dollimore, Tim Kindberg, George Coulouris , "Distributed Systems: Concepts and Design," Addison-Wesley 2005
2. Andrew Tannenbaum, Maarten van Steen , "Distributed Systems: Principles and Paradigms," Prentice Hall (2nd edition) 2006.
3. Nancy Lynch , "Distributed Algorithms," Morgan Kaufmann 1996.

The reference books will be available at the Engineering Library.

Prerequisites

Some knowledge of Operating Systems and/or Networking, Algorithms, and interest in Distributed Computing. This is not a programming course. Our goal is to learn and analyze why and how distributed systems work, why some of them fail, and how to tolerate failures and various dynamic behaviors.

Teaching Assistant

None.

Special course assistant : Andrew Berns, andrew-berns@uiowa.edu, 201N MLH
Available on Wednesdays 3:30-4:40 PM

Course Objective

A distributed system is a network of processes collectively performing a meaningful job or providing a service to the users. Such systems are being increasing relevant to our lives and our society. How processes communicate and interact with one another, how to guarantee correctness and build tolerances to various kinds of failures or dynamic behaviors, how to design distributed algorithms for specific problems, manage replicas and provide group communication services are up for discussion. This course will deal with the theory and algorithms related to distributed systems, and not programming aspects.

Course Outline

Introduction to Distributed Systems
Models and Proof
Time and Clocks
Distributed Mutual Exclusion
Distributed Snapshot and Global States
Distributed Algorithms for Graphs
Fault and Fault-tolerance
Distributed Consensus
Group Communication
Replicated data management
Self-stabilization
Applications: peer-to-peer networks

Course policies are governed by the College of Liberal Arts.

Handouts

First day handout

Homework and Exams

Four homeworks and a quiz worth 30% of the final grade. One midterm exam and a final worth 70% of the grade. The midterm examination will be scheduled in the evening.

Homework 1 Assigned 9/7, due 9/14.
Homework 2 Assigned 9/21, due 9/28.
Homework 3 Assigned 10/7, due 10/19.
Homework 4 Assigned 11/2, due 11/11.
Quiz on Wednesday 11/16/11 in class.
Midterm Examination on Thursday, 9/29/11 (6:30-8:00 PM, Room 205 MLH)
Final Examination on Tuesday, 12/13/11 (9:45-11:45 AM, Room 205 MLH)

Tentative scale for letter grades

A+ = 95-100     B+ = 80-84     C+ = 65-69     D+ = 50-54    F = 0-39
A  = 90-94      B  = 75-79     C  = 60-64     D  = 45-49
A- = 85-89      B- = 70-74     C- = 55-59     D- = 40-44

The instructor reserves the right to make minor modifications in the above grading scale.

Lecture Notes

Week 1 notes (8/22-8/26)
(Chapters 1, 3)
Introductory topics -- interprocess communication models.

Week 2 notes (8/29-9/02)
(Chapters 4, 5)
Important properties of distributed computation -- correctness criteria and proof techniques.

Week 3 notes (9/05-9/09)
(Chapter 6)
Time in a distributed system -- Physical clocks, logical clocks, sequential and concurrent events.
Read papers 1 and 5 from the on-line resources. These are the original sources.

Week 4 notes (9/12-9/16)
(Chapters 7, 8)
Mutual exclusion -- Introduction to Distributed Snapshot
Read paper 3 from the online resources.

Week 5 notes (9/19-9/23)
(Chapters 8, 9)
Distributed Snapshot -- Global State Collection

Week 6 notes (9/26-9/30)
(Chapter 10)
Introduction to Distributed Graph Algorithms

Week 7 notes (10/3-10/7)
(Chapters 10, 11)
Distributed Graph Algorithms -- Coordination Algorithms

Week 8 notes (10/10-10/14)
(Chapters 12)
Introduction to Faults and Faut-tolerance
Also read paper number 4 (Arora and Gouda) from the On-line resources

Week 9 notes (10/17-10/21)
(Chapters 13)
Distributed Consensus

Week 10 notes (10/24-10/28)
(Chapters 13)
Distributed Consensus (continued): Byzantine Generals Problem and Failure Detectors
Read paper 2 (Lamport, Shostak and Pease) from the on-line resources

Week 11 notes (10/31-10/4)
(Chapters 15)
Group Communication

Week 12 notes (11/7-11/11)
(Chapters 16)
Replication

Week 13 notes (11/14-11/18)
(Chapters 17)
Self-stabilization

Week 15 notes (11/28-12/02)
(Chapters 21)
Peer-to-Peer Networks
Read papers 10 and 11.

Week 16 notes (12/05-12/09)
(Chapters 21)
Small-world Networks
Proof of Kleinberg's small-world theorem
Read paper 12

On-line resources

1. Leslie Lamport: Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM 21(7): 558-565 (1978) (pdf)

2. Leslie Lamport, Robert E. Shostak, Marshall C. Pease: The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4(3): 382-401 (1982) (pdf)

3. K. Mani Chandy, Leslie Lamport: Distributed Snapshots: Determining Global States of Distributed Systems ACM Trans. Comput. Syst. 3(1): 63-75 (1985) (pdf)

4. Anish Arora, Mohamed Gouda: Closure and Convergence: A Foundation of Fault-Tolerant Computing. IEEE Trans. Software Eng 19 (11) pp. 1015-1027, November 1993. (pdf)

5. Leslie Lamport, P.M. Melliar-Smith: Byzantine Clock Synchronization. ACM PODC 1984. (pdf)

6. Jeffrey Dean and Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Communications of the ACM 51 (1), January 2008. (pdf)

7. Tushar Deepak Chandra and Sam Toueg: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM, 43:2, March 1996, 225-267. (pdf)

8. Edsger W. Dijkstra: Self-stabilizing Systems in Spite of Distributed Control. Commun. ACM 17(11): 643-644 (1974). (pdf)

9. Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson: Impossibility of Distributed Consensus with One Faulty Process.Vol. 32, No. 2, pp. 374-382 (1985) (pdf)

10. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11(1): 17-32 (2003) (pdf)

11. James Aspnes, Gauri Shah: Skip graphs. SODA 2003: 384-393 (pdf)

12. Watts, D.J.; Strogatz, S.H. (1998). "Collective dynamics of 'small-world' networks.". Nature 393 (6684)

13. Jon Kleinberg. The small-world phenomenon: an algorithm perspective (2000) (pdf)