MATH:3900 Introduction to Mathematics Research:

Data Analysis with TDA Mapper

Spring 2017 Section 0001: 11:00A - 12:15P TTh 113 MLH

Instructor:  Dr. Isabel K. Darcy Department of Mathematics and AMCS, University of Iowa
Office:B1H MLH
Phone: 335- 0778
Email: isabel-darcy AT uiowa.edu

Office hours: Tuesdays/Thursday 8:50 - 9:15am, 12:30 - 1:35pm and by appointment.

TA: Maria Gommel
Office B20J MLH.
Office hours in Math Lab (125 MLH) Wednesdays from 1:30-2:30pm and 3:30-4:30pm and by appointment.

Click here for rest of syllabus including grading scheme


TDA Mapper was developed by Gurjeet Singh, Facundo Memoli and Gunnar Carlsson. The company Ayasdi is based on the Mapper algorithm. Both python versions and R versions are freely available. The algorithm is very simple (bin data into overlapping bins, cluster each bin, create a graph where vertices = clusters and two clusters are connected by an edge if they have points in common).

TENTATIVE CLASS SCHEDULE-ALL DATES SUBJECT TO CHANGE (click on date/section for pdf file of corresponding class material):

Tentative ScheduleHW/Announcements
Week 1
1/17 Professor Gunnar Carlsson Introduces Topological Data Analysis, Mapper slides, worksheet Icon Quiz 1 (Due 1/19 at 7:00 AM) over Voronoi (6:06 min) and k-means (9:10 min)

HW 1(Due 1/19) --10 points
1.) Add your info (picture, name, interests) to our course wiki class list
2.) Choose one clustering method and add some references to the course reference wiki.

1/19 Meet in B5 MLH: Lab 1 files , slides
FYI: scikit-learn clustering
Week 2
1/24 More TDA mapper HW 2 (Due 1/24) -- 5 points :
Start writing code to create your own TDA mapper. Note, you only need to outline the algorithm using commenting.

Project HW 1 (Due 1/26) -- 5 points
Download a data set and answer the following questions:

  • a.) Where did you get your data?
  • b.) Briefly describe your data
  • c.) What format is your data in (eg excel, text, etc.)?
  • d.) How many data points are in your data set?
  • e.) Does your data live in a fixed dimension and if so, what is that dimension?

HW 3 (due Thursday 1/26, 10 points):
Do a few Swirl submodules (1 point each for up to 10 submodules of any combination of Swirl courses).

Project HW 2 (Due 1/26) -- 10 points
Draft of slides describing a clustering method.

1/26 Meet in B5 MLH: Lab 2
FYI:
Week 3
1/31 Mapper Examples Icon Quiz 2 (Due 2/2 at 7am) over TDA Mapper videos:
Intro, slides
Examples, slides
Summary, slides

Project HW 3 (Due 2/2) -- 10 points
Write a 2-5 page description of a clustering method (include R command(s)).
Note: You may instead focus on creating a poster introducing the TDA mapper algorithm for the informatics symposium (draft due 2/7)

2/2 Mapper applied to cancer data
Week 4
2/7 Project HW 4 (Due 2/7) -- 10 points
Slides describing a clustering method or poster introducing the TDA mapper algorithm.
2/9 Mini-presentations on clustering or TDA mapper(50 points)
Week 5
2/14 Lab files Project HW (Due 2/17)
Intro draft including 2, 8, 10
2/16 Github (optional)
Week 6
2/21 slides, Ayasdi resources, Patient Stratification, Iris data set, databasics.r, flaresTransformed.r Icon Quiz 3 (10 points; Due 2/23 at 7:00 AM) over first 20 minutes of Applications of TDA to the Understanding of Disease and Drug Discovery, Pek Lum, Ayasdi
2/23 slides, Identification of type 2 diabetes subgroups through topological analysis of patient similarity, Science Translational Medicine, 2015, Precision Medicine Using Topological Data Analysis,
Week 7
2/28 slides, Project HW 5 (Due 2/28) -- 10 points:
Analyze dataset as described here

Icon Quiz 4 (10 points; Due 2/28 at 7:00 AM) over first 40 minutes of Applications of TDA to the Understanding of Disease and Drug Discovery, Pek Lum, Ayasdi

Icon Quiz 5 (10 points; Due 3/2 at 7:00 AM) over Applications of TDA to the Understanding of Disease and Drug Discovery, Pek Lum, Ayasdi

Icon Quiz 6 Review (50 points; Due 3/2 at 7:00 AM)

Project HW 6 (Due 3/4) -- 20 points:
Analyze dataset as described here

3/2 slides
Week 8
3/7 Midterm (100 points) Project HW (Due Sunday 3/5)
Intro draft including 2, 3 8, 10. Focus on description of TDA mapper algorithm.
3/9 slides
*****Spring Break March 12 - 19****
Week 9
3/21 Exploring data with topological tools, Marinka Zitnik, XRDS: Crossroads, The ACM Magazine for Students, 2014, slides Icon Quiz 7 Reading (10 points; Due 3/21 at 7:00 AM)
Icon Quiz 8 Reading (10 points; Due 3/23 at 7:00 AM)
3/23 A Topological Data Analysis Approach to Visualizing Ebola Tweets, Herchel Thaddeus Machacon, 2016, slides
Week 10
3/28 KS statistics, ks.r Project HW (Due Monday, 3/27)
Polished draft including 2, 3 4, 7 - 10.

3/30 LAB: meet in B5 MLH (basement computer lab)
Week 11
4/4 slides
Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. G. Singh, F. Memoli and G. Carlsson. Symposium on Point Based Graphics 2007, Prague, September 2007
Project HW (due 4/5)
Draft of your project which should be at least 50% done.

Icon Quiz 9 Reading (10 points; Due 4/6 at 7:00 AM)

4/6 Simplicial complex and Topology
Week 12
4/11 Multi-resolution/multi-scale Icon Quiz 10 Reading (10 points; Due 4/11 at 7:00 AM)

Icon Quiz 11 Reading (10 points; Due 4/13 at 7:00 AM)

4/13 Mini-presentations
Week 13
4/18 Clustering and Multi-dimensional Mapper Project HW (due 4/17)
Draft of your project which should be at least 80% done.

Icon Quiz 12 Reading (10 points; Due 4/20 at 7:00 AM)

4/20 Mapper on 3D Shape Database
Week 14
4/25meet in computer lab B5 MLH Finished Project due 4/26
Project slides due 4/30
4/27Guest Lecturer Wako Bungulo: Grad School/Mapper
PCA , Euler characteristic
Week 15
5/2 Student presentations (200 pts)
Big Data
HW 9 (due 5/2) Summarize May 2th presentations (0 points)
HW 10 (due 5/4) Summarize May 4th presentations(0 points)
Project HW (due 5/4) Outside presentations, etc (300 pts)
5/4 Student presentations (200 pts)
Cleaning Data

-->

Mapper References:

Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition G. Singh , F. Memoli, G. Carlsson (2007)

G. Bowman, X. Huang, Y. Yao, J. Sun, G. Carlsson, L. Guibas and V. Pande, ?Structural insight into RNA Hairpin Folding Intermediates?, Journal of American Chemical Society (Communications). Jul 2008

Topology and data, G Carlsson (2009) (Mapper: p. 281 - 289)

Topological methods for exploring low-density states in biomolecular folding pathways (2009)
            Nov 22 video, pptx, pdf
An eQTL biological data visualization challenge and approaches from the visualization community (2011)

Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival M. Nicolau, A. J. Levine, G. Carlsson (2011) video, pptx, pdf

Extracting insights from the shape of complex data using topology P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson (2013) video, pptx, pdf

video, pptx, pdf Download Mapper for Matlab
Python Mapper
Graphviz
Web tool: Progression Analysis of Disease - PAD (includes Mapper)
Ayasdi Iris, academic trial

DNA MICROARRAY VIRTUAL LAB, youtube video
How to Analyze DNA Microarray Data, Howard Hughes Medical Institute
Pearson Product-Moment Correlation

Nov 20 video, pptx, pdf Intro to RNA & Topological Landscapes for Visualization of Scalar-Valued Functions.
Generating and exploring a collection of topological landscapes for visualization of scalar-valued functions. by W. Harvey and Y. Wang, Comput. Graphics Forum (Special issue from EuroVis) 2010

Topological data analysis of Escherichia coli O157:H7 and non-O157 survival in soils (Sept 2014)

Topological methods reveal high and low functioning neuro-phenotypes within fragile X syndrome (Sept 2014)

Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury (Oct 2015)

A Tool for Interactive Data Visualization: Application to Over 10,000 Brain Imaging and Phantom MRI Data Sets (March 2016)

Additional readings
Mathworks Matlab Tutorials
Kaggle data analysis competitions
Data for MATLAB hackers (pre-2010)
http://yann.lecun.com/exdb/mnist/
Using the MNIST Dataset

##############################################################

YOU CAN IGNORE EVERYTHING BELOW THIS LINE.

Tentative Schedule:

Persistent Topology and Metastable State in Conformational Dynamics

Week 1 Topological Data Analysis: Software Mapper
Aug 26 video, pptx, pdf Extracting insights from the shape of complex data using topology P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson (2013)
Aug 28 video, pptx, pdf Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival M. Nicolau, A. J. Levine, G. Carlsson (2011)
Aug 30 video, pptx, pdf Download Mapper for Matlab
Python Mapper
Graphviz
Web tool: Progression Analysis of Disease - PAD (includes Mapper)
Ayasdi Iris, academic trial
Additional readings:
Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition G. Singh , F. Memoli, G. Carlsson (2007)
Topology and data, G Carlsson (2009) (Mapper: p. 281 - 289)

DNA MICROARRAY VIRTUAL LAB, youtube video
How to Analyze DNA Microarray Data, Howard Hughes Medical Institute
Pearson Product-Moment Correlation

Nov 20 video, pptx, pdf Intro to RNA & Topological Landscapes for Visualization of Scalar-Valued Functions.
Generating and exploring a collection of topological landscapes for visualization of scalar-valued functions. by W. Harvey and Y. Wang, Comput. Graphics Forum (Special issue from EuroVis) 2010
Nov 22 video, pptx, pdf Topological methods for exploring low-density states in biomolecular folding pathways

An eQTL biological data visualization challenge and approaches from the visualization community

G. Bowman, X. Huang, Y. Yao, J. Sun, G. Carlsson, L. Guibas and V. Pande, ?Structural insight into RNA Hairpin Folding Intermediates?, Journal of American Chemical Society (Communications). Jul 2008

Additional readings
Additional readings
Mathworks Matlab Tutorials
Kaggle data analysis competitions
Data for MATLAB hackers (pre-2010)
http://yann.lecun.com/exdb/mnist/
Using the MNIST Dataset
Week 11
4/4 Discuss Preparatory Lecture 1: The Euler characteristic (20:32 min)
Optional FYI: Mobius band, Klein bottle
Jeff Weeks: "Shape of Space" book, 2013 video (60 min), software, games
The Geometry Center Shape of Space Video (11 min)
Project HW (due 4/5)
Draft of your project which should be at least 50% done.

Icon Quiz 9 Reading (10 points; Due 4/4 at 7:00 AM)

Icon Quiz 10 Reading (10 points; Due 4/6 at 7:00 AM)

Icon AT Quiz 1 (Due 4/4 at 7:00 AM) over Preparatory Lecture 1

Icon AT Quiz 2 (Due 4/6 at 7:00 AM) over Preparatory Lecture 2

4/6 Discuss Preparatory Lecture 2: Addition and Free Abelian Groups (17:53 min),  Worksheet 1,  answers
Intro to Data Analysis slides
Week 12
4/11 Discuss Preparatory Lecture 3: Modular Arithmetic (9:40 min)  Worksheet 2,  
Installing R and Rstudio, tips, pptx
Start discussing On the Local Behavior of Spaces of Natural Images, Gunnar Carlsson, Tigran Ishkhanov, Vin de Silva, Afra Zomorodian, International Journal of Computer Vision January 2008, Volume 76, Issue 1, pp 1-12.
slides
Icon Quiz 14 Reading (10 points; Due 4/11 at 7:00 AM)

Icon Quiz 15 Reading (10 points; Due 4/13 at 7:00 AM)

Icon AT Quiz 3 (Due 4/11 at 7:00 AM) over Preparatory Lecture 3

Icon AT Quiz 4 (Due 4/13 at 7:00 AM) over Preparatory Lecture 4

4/13 Discuss Preparatory Lecture 4: Addition and Free Vector Spaces (21:26 min)
 Worksheet 3, Continue image analysis discussion
Week 13
4/18 Discuss Preparatory Lecture 5: Triangulations and Simplicial Complexes (28:19 min)
 Worksheet 4, Continue image analysis discussion
Project HW (due 4/17)
Draft of your project which should be at least 80% done.

Icon Quiz 16 Reading (10 points; Due 4/20 at 7:00 AM)

Icon AT Quiz 5 (Due 4/18 at 7:00 AM) over Preparatory Lecture 5

Icon AT Quiz 6 (Due 4/20 at 7:00 AM) over Preparatory Lecture 6

4/20Discuss Preparatory Lecture 6: Creating a Simplicial Complex from Data (28:15 min),
Equivalence relations and partitions, worksheet 5
Week 14
4/25Homology example, Create Your Own Homology Finished Project due 4/26
Project slides due 4/30
4/27 Barcodes (pptx), Persistence Diagrams
Week 15
5/2 Student presentations (200 pts) HW 9 (due 5/2) Summarize May 2th presentations
HW 10 (due 5/4) Summarize May 4th presentations
Project HW (due 5/4) Outside presentations, etc (300 pts)
5/4 Student presentations (200 pts)