22S:295:002
High Performance Computing in Statistics
Luke Tierney and Kate Cowles
Fall 2007
Information and Resources
- Class meets 9:00-10:00 AM on Thrusdays in Schaeffer 241B.
- Computing resources.
- Class notes:
- Introduction, August
30, 2007.
- The Statsitics Cluster,
September 6, 2007.
- Introduction to snow,
September 13, 2007.
- Some snow
Examples, September 20,
2007; code examples.
- More snow
Examples, September 27,
2007; kriging
example, image processing
example, and image data.
- Using MPI in C programs,
October 4, 2007; code examples
- Brief Introduction to OpenMP,
October 11, 2007.
- Batch Scheduling and Resource
Management, October 18, 2007.
- Managing/analyzing the Netflix
data, October 25,
2007; handout;
code.
- A Parallel Approach
to Microarray Preprocessing and Analysis, November 1,
2007.
- The MapReduce Framework,
November 8, 2007; mapper
and reducer for word
counting; mapper for movie rewiew
counting.
- Introduction to PLAPACK,
November 15, 2007; some supporting
materials.
- Grid Computing and the
TeraGrid, November 29, 2007;
some supporting materials.
- Some changes in snow and
R, December 13, 2007.
- Cluster notes:
References
- The Landscape of Parallel Computing Research: A View from
Berkeley
(
pdf and
wiki)
- First chapter of
Handbook of Parallel Computing and Statistics, Erricos John
Kontoghiorghes (Ed.), 2006.
- PyStream:
Stream and GPU computing in Python.
- iPython
and parallel
computing with iPython.
- Parallel Programming with MPI, by Peter Pacheco,
Morgan Kaufmann, 1997.
- PVM: Parallel Virtual Machine by Al Geist et al., MIT
Press, 2000.
- Simple Parallel Statistical Computing in R, by A. J. Rossini,
L. Tierney, and N. Li, Journal of Computational and Graphical
Statistics, 16 (1), 399-420,
2007. Technical
report version available
from COBRA.
- Parallel Programming in OpenMP by Rohit Chandra et al.,
Morgan Kaufmann, 2001.
- Patterns for Parallel Programming by Timothy
G. Mattdon, Beverly A. Sanders, and Berna L. Massingill, Addison
Wesley, 2005.
- Condor home page
and manuals.
- MapReduce: Simplified Data Processing on Large Clusters, by
Jeffrey Dean and Sanjay Ghemawat, OSDI'04: Sixth Symposium on
Operating System Design and Implementation, San Francisco, CA,
December, 2004. Available
online.
- Wikipedia
entry
for MapReduce.
- Some MapReduce
software: Hadoop
from
Apache; IBM
MapReduce Tools for Eclipse;
the Pig project;
the Sawzall
language.
Luke Tierney 2007-12-13