CS 5620 (22C:166) Distributed Systems and Algorithms

(Spring 2015) 11:00AM - 12:15PM, 71SH


Instructor: Sukumar Ghosh
201P MLH, sukumar-ghosh@@uiowa.edu, 319-335-0738
Office Hours: 3:30PM - 5:00PM TTh or by appointment

Announcements, Prerequisites, Handouts, Homework and Exams, Lecture Notes,On-line resources,


Objective

Life in the 21st century has a growing dependence on networked services that have changed the fabric of the society. Starting from web searching, video-conferencing, stock trading and net banking, to keeping in touch with friends and peers through various kinds of social networks, network-based services play a dominant role. While networks provide the basic connectivity, the various services built on top of these networks are examples of distributed systems. The objective of this course is to help you understand some of the foundational issues in the design of distributed systems. Topics will include:

Textbook

Sukumar Ghosh: Distributed Systems and Algorithms (Second Edition), CRC Press 2014a

In addition to the textbook, we will occasionally use the following books as references:

[1] Dollimore, Tim Kindberg, George Coulouris: Distributed Systems: Concepts and Design [5th Edition], Addison-Wesley 2012
[2] Andrew Tannenbaum, Maarten van Steen: Distributed Systems: Principles and Paradigms (Second edition), Prentice Hall 2006
[3] Nancy Lynch : Distributed Algorithms. Morgan Kaufmann 1996

You need not purchase the reference books. They should be available in the library.

Prerequisites

Some knowledge of Operating Systems and/or Networking, Algorithms, and interest in Distributed Computing. Our goal is to learn and analyze why and how distributed systems work, why some of them fail, and how to tolerate failures and cope with various dynamic behaviors.

Teaching Assistant

None.

Course policies are governed by the College of Liberal Arts.

Handouts

Homework and Exams

Five homeworks worth 35% of the final grade. One midterm exam (30%), and a final exam (35%).

Tentative scale for letter grades
A+ = 95-100     B+ = 80-84     C+ = 65-69     D+ = 50-54    F = 0-39
A  = 90-94      B  = 75-79     C  = 60-64     D  = 45-49
A- = 85-89      B- = 70-74     C- = 55-59     D- = 40-44

The instructor reserves the right to make minor modifications in the above grading scale.

Lecture Notes

Week 1 (1/19-1/23). Introduction to Distributed Systems
Read Chapters 1-3
Week 2 (1/26-1/30). Non-determinism, Atomicity, Fairness, Liveness and Safety
Read Chapters 4-5
Week 3 (2/2-2/6). Time and Clock
Read Chapter 6
Week 4 (2/9-2/13). Distributed Mutual Exclusion
Read Chapter 7
Week 5 (2/16-2/20). Distributed Snapshot
Read Chapter 8
Week 6-7(2/23-3/6). Global State Collection
Read Chapter 9
Week 8 (3/9-3/13). Graph Algorithms
Read Chapter 10
Week 9 (3/23-3/27). Coordination Algorithms
Read Chapter 11
Week 9 (3/30-4/2). Faults and Fault-tolerance
Read Chapter 12
Week 10 (4/6-4/10). Distributed Consensus
Read Chapter 13
Week 11 (4/13-4/17). Distributed Consensus (wrap-up), Failure detectors
Read Chapter 13
Week 12 (4/20-4/24). Group Communication
Read Chapter 15
Week 13 (4/27-5/1). Replication
Read Chapter 16

Online resources

1. Leslie Lamport: Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM 21(7): 558-565 (1978) (pdf)

2. Leslie Lamport, Robert E. Shostak, Marshall C. Pease: The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4(3): 382-401 (1982) (pdf)

3. K. Mani Chandy, Leslie Lamport: Distributed Snapshots: Determining Global States of Distributed Systems ACM Trans. Comput. Syst. 3(1): 63-75 (1985) (pdf)

4. Anish Arora, Mohamed Gouda: Closure and Convergence: A Foundation of Fault-Toler ant Computing. IEEE Trans. Software Eng 19 (11) pp. 1015-1027, November 1993. (pdf)

5. Leslie Lamport, P.M. Melliar-Smith: Byzantine Clock Synchronization. ACM PODC 19 84. (pdf)

6. Jeffrey Dean and Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Communications of the ACM 51 (1), January 2008. (pdf)

7. Tushar Deepak Chandra and Sam Toueg: Unreliable Failure Detectors for Reliable D istributed Systems. Journal of the ACM, 43:2, March 1996, 225-267. (pdf)

8. Edsger W. Dijkstra: Self-stabilizing Systems in Spite of Distributed Control. Commun. ACM 17(11): 643-644 (1974). (pdf)

9. Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson: Impossibility of Dist ributed Consensus with One Faulty Process.Vol. 32, No. 2, pp. 374-382 (1985)

10. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11(1): 17-32 (2003) (pdf)

11. James Aspnes, Gauri Shah: Skip graphs. SODA 2003: 384-393 (pdf)

12. Watts, D.J.; Strogatz, S.H. (1998). "Collective dynamics of 'small-world' networks.". Nature 393 (6684)

13. Jon Kleinberg. The small-world phenomenon: an algorithm perspective (2000) (pdf)

14. Leslie Lamport: Paxos made Simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (No. 121, December 2001) 51-58. (pdf)

15. Hector Garcia-Molina: Elections in a Distributed Computing System. IEEE Trans. Computers (1982) (pdf)

16. Baruch Awerbuch: The Complexity of Network Synchronization. Journal of ACM (1985) (pdf)