Data Visualization with TDA Mapper

Spring 2018

Instructor:  Dr. Isabel K. Darcy Department of Mathematics, AMCS, and Informatics, University of Iowa
Office: B1H MLH
Phone: 335-0778
Email: isabel-darcy AT uiowa.edu
Office hours: Tuesdays 8:50 - 9:15am and 12:30 - 2:00pm+, Thursday 8:50 - 9:15am, 12:30 - 12:40pm+and by appointment (Note + means I will normally be available for longer).

TA: Wako Bungula
Office: 325L MLH
Office hours: Tuesdays/Thursday 9:30 - 11:00am and by appointment.
Email: wako-bungula@uiowa.edu

Math 3900 syllabus including grading scheme


TDA Mapper was developed by Gurjeet Singh, Facundo Memoli and Gunnar Carlsson. The company Ayasdi is based on the Mapper algorithm. Both python versions and R versions are freely available. The algorithm is very simple (bin data into overlapping bins, cluster each bin, create a graph where vertices = clusters and two clusters are connected by an edge if they have points in common).

TENTATIVE CLASS SCHEDULE-ALL DATES SUBJECT TO CHANGE (click on date/section for pdf file of corresponding class material):

During the first 8 weeks we will introduce the TDA mapper algorithm including all needed background (e.g, clustering, PCA, some graph theory, etc.). On most Thursdays we will meet in the B5 MLH computer lab to run software related to the previous Tuesday's lecture. Explicit directions will be provided. The TA and I will be available to provide assistance. Labs can be incorporated into your project. In weeks 9 - 12, we will compare TDA mapper to other data analysis/visualization software. The remaining weeks will include lectures, mentoring talks and group presentations.

Click here for information regarding all assignments including ICON quizzes, HW, presentations, and project.

Tentative ScheduleHW/Announcements
Week 1
1/16 Professor Gunnar Carlsson Introduces Topological Data Analysis,       Mapper slides Icon Quiz 1 (Due 1/18 at 7am) over TDA Mapper videos:
Intro, slides
Examples, slides
Summary, slides

HW 1 (Due 1/18) --2 points
Add your info (picture, name, interests) to our course wiki class list for Math 3900 or course wiki class list for Math 7450

HW 2 (Due 1/18) --5 points
Complete worksheet

1/18 Meet in B5 MLH: Lab 1 files , Goals
Start a draft of a poster introducing the TDA mapper algorithm.
Extracting insights from the shape of complex data using topology P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson (2013) video, pptx, pdf

Exploring data with topological tools, Marinka Zitnik, XRDS: Crossroads, The ACM Magazine for Students, 2014

FYI: Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition G. Singh , F. Memoli, G. Carlsson (2007)

http://yann.lecun.com/exdb/mnist/
Using the MNIST Dataset

Week 2
1/23 R/Rstudio, More TDA mapper: Filter functions I, Clustering I HW 3 (Due 1/23) -- 5 points :
Draft of a poster introducing the TDA mapper algorithm. (note poster can be printed on normal letter size paper -- make sure you use a readable font size)

HW 4 (due Thursday 1/25, 10 points):
Do a few Swirl submodules (1 point each for up to 10 submodules of any combination of Swirl courses).

1/25 Meet in B5 MLH: Lab 2 goals , Lab 2 files
FYI: scikit-learn clustering
Week 3
1/30 Filter functions II , Clustering II HW 5 (Due 2/5) -- 5 points :
Poster introducing the TDA mapper algorithm (note poster can be printed on normal letter size paper -- make sure you use a readable font size).
2/1 Meet in B5 MLH: Lab 3
Week 4
2/6 Jupyter, Simplicial complexes, etc. , Distances Icon Quiz 2 Review (10 points; Due 2/6 at 7:00 AM)

Project (Due 2/6) Intro draft including 2, 3, 7-10

2/8 Meet in B5 MLH: Lab 4 , Jupyter lab files
Week 5
2/13 coloring, KS statistics Project (Due 2/13) Intro draft including 2, 3, 7-10
2/15 Meet in B5 MLH: Lab 5 files , goals
Week 6
2/20 KS statistics, Distances, Talk advice, Github Project (Due 2/22) Writing fellow version

2/22 Meet in B5 MLH
Week 7
2/27 More TDA mapper examples Mini-presentation slides first draft due Monday 2/26 at 10am (10 - 20 points)

Mini-presentation slides final draft due Wednesday 2/28 at 10am (10 - 0 points)

HW 6 (Due 3/6) -- 10 points : Practice exam

3/1 Mini-presentations
Week 8
3/6 Review , (with notes) Icon Quiz 3 Review (25 points; Due 3/5)

Project (due 3/9) Revision of 2/22 version based on writing fellow comments -- extension may be requested depending on suggested revisions.

3/8 Midterm (50 points)
*****Spring Break March 12 - 19****
Week 9
3/20 Clustering Icon Quiz 4 (Due Thursday 3/22 at 7:00 AM) over Voronoi (6:06 min) and k-means (9:10 min)
3/22 LAB meet in B5 MLH (basement computer lab) , goals
Week 10
3/27 Clustering; PCA Project (Due Tuesday, 3/27)
Polished draft including 2, 3 4, 7 - 10.
3/29 Lab Files : meet in B5 MLH (basement computer lab)
Week 11
4/3 PCA; MDS Project (due 4/3)
Draft of your project which should be at least 80% done.
4/5 LAB meet in B5 MLH (basement computer lab) , goals
Week 12
4/10 MDS; other methods Icon Quiz 5 Review (9 points; Due 4/10 at 7:00 AM)

Polished Project (due 4/12): Do not include unfinished sections.

4/12Meet in B5 MLH: Lab files ,
Week 13
4/17 other methods; ethics Project (due 4/17) Final project including unfinished sections.

Final presentation slides first draft (due 4/19) -- 15 pts,

4/19 Industry Guest Lecturer
Week 14
4/24 Guest Lecturer Wako Bungulo: Grad School/Mapper Icon Quiz 6 Review (9 points; Due 4/24)

Final presentation slides 2nd draft (due 4/26, print 6 slides/page and bring to class) -- 15 pts

Finished Project due 4/28.

4/26Ethics/Data Privacy
Week 15
5/1 Student presentations
Final presentation slides final version (due 4/30 noon) -- 10 pts
5/3 Student presentations

Mapper References:

G. Bowman, X. Huang, Y. Yao, J. Sun, G. Carlsson, L. Guibas and V. Pande, ?Structural insight into RNA Hairpin Folding Intermediates?, Journal of American Chemical Society (Communications). Jul 2008

Topology and data, G Carlsson (2009) (Mapper: p. 281 - 289)

Topological methods for exploring low-density states in biomolecular folding pathways (2009)
            Nov 22 video, pptx, pdf
An eQTL biological data visualization challenge and approaches from the visualization community (2011)

Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival M. Nicolau, A. J. Levine, G. Carlsson (2011) video, pptx, pdf

video, pptx, pdf Download Mapper for Matlab
Python Mapper
Graphviz
Web tool: Progression Analysis of Disease - PAD (includes Mapper)
Ayasdi Iris, academic trial

DNA MICROARRAY VIRTUAL LAB, youtube video
How to Analyze DNA Microarray Data, Howard Hughes Medical Institute
Pearson Product-Moment Correlation

Nov 20 video, pptx, pdf Intro to RNA & Topological Landscapes for Visualization of Scalar-Valued Functions.
Generating and exploring a collection of topological landscapes for visualization of scalar-valued functions. by W. Harvey and Y. Wang, Comput. Graphics Forum (Special issue from EuroVis) 2010

Topological data analysis of Escherichia coli O157:H7 and non-O157 survival in soils (Sept 2014)

Topological methods reveal high and low functioning neuro-phenotypes within fragile X syndrome (Sept 2014)

Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury (Oct 2015)

A Tool for Interactive Data Visualization: Application to Over 10,000 Brain Imaging and Phantom MRI Data Sets (March 2016)

Additional readings
Mathworks Matlab Tutorials
Kaggle data analysis competitions
Data for MATLAB hackers (pre-2010)