Text Retrieval & Text Mining Reading Group

Fall 2006

Tuesday 2:30 pm to 4:00 pm.

School of Library & Information Science (meeting room)

3rd floor Main Library (here is a map showing the building location.
The room is on the side closest to Burlington Street)

Conference Deadlines:

RIAO (December 1 deadline - Pittsburgh)
SIGIR 2007 (January 28 deadline - Amsterdam)
WWW 2007 Conference (November 20 deadline (Poster deadline TBA) - Banff)

Goal: To study current papers from journals and conference proceedings in text retrieval and text mining. Examples of problems include novelty detection, web retrieval and web mining, ranking strategies, ambiguity resolution, knowledge discovery, web phenomenon including social networks, information extraction and text classification. The reading group is lead by Professor Padmini Srinivasan. Interested students (from beginning to advanced students) and faculty are invited to participate in the reading group. Participation format is informal with individuals taking turns to present an overview of the selected paper and lead the discussion. This forum has resulted in collaborative projects and published papers.

Special Focus: We will continue to read papers from different proceedings and journals. Additionally this semester we will take a close look at some of the TREC tracks. (TREC is an international forum for testing algorithms and models on well defined problems.) Participants are encouraged to suggest readings aligned with their interests.

Note if you would like to attend the reading group sessions and have a timing conflict please let me know.

  1. August 25, 2006: TREC web site. An overview.
    2005 TREC proceedings

    Ellen M. Voorhees. Overview of TREC 2005.

  2. September 1, 2006: (get all papers from TREC proceedings. Focus on Enterprise track descriptions only).

    1. Craswell N. de Vries A.P Soboroff I. Overview of the TREC 2005 Enterprise Track.
    2. Macdonald C, He B, Plachouras V, Ounis I. University of Glasgow at TREC 2005: Experiments in Terabyte and Enterprise Tracks with Terrier.
    3. Fu Y, Yu W, Li Y, Liu Y, Zhang M. Tsinghua University (State Key Lab) THUIR at TREC 2005: Enterprise Track.

  3. September 8, 2006: Matsuo Y, Mori J and Hamasaki M. POLYPHONET: An Advanced Social Network Extraction System from the Web. Proceedings of the WWW Conference, 2006.

  4. September 15, 2006: (Get paper from TREC 2005 proceedings).

    Cao Y., Liu J. Bao S. and Li, H. Research on Expert Search at Enterprise Track of TREC 2005.

  5. September 22, 2006: (Get paper from TREC 2005 proceedings).

    Lin J., Abels E., et al. A Menagerie of Tracks at Maryland: HARD, Enterprise, QA, and Genomics, Oh My! (Focus on section 3, the Enterprise track).

  6. September 29, 2006:

    Zhang Y, Zincir-Heywood N., and Milios E. Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora. 7th ACM International Workshop on Web Information and Data Management (WIDM), CIKM 2005.

  7. October 3, 2006:

    Balog K., Azzoparti L., de Rijke M. Formal Models for Expert Finding in Enterprise Corpora SIGIR 2006. (Do a google search on the title).

  8. October 10, 2006:

    1. Lucene

    2. Hema Raghavan, James Allan, Andrew McCallum, An Exploration of Entity Models, Collective Classification and Relation Description, Proceedings of the Second International Workshop on Link Analysis and Group Detection, LinkKDD2004, August 22, 2004 in conjunction with the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, pp 1-10.

  9. October 17, 2006:

    Lucene (Brian Almquist) and a discussion of Netflix

  10. October 27, 2006: Lucene (chapter 3) lead by Ritesh Nadhani

  11. November 3, 2006: Lucene

  12. November 10, 2006: Lucene

  13. November 17, 2006: Viet Ha-Thuc et al. A Fuzzy Synset-Based Hidden Markov Model for Automatic Text Segmentation. presented by Viet Ha-Thuc

  14. November 21, 2006: Thanksgiving

  15. November 28, 2006: Baron et al. TREC 2006 Legal Track Overview. lead by Brian Almquist (paper sent by email).

  16. December 5, 2006: Perer, Shneiderman and Oard. Using rhythms of relationships to understand e-mail archives. JASIST 2006.

  17. December 12, 2006: Song et al. Personalized Recommendation Driven by Information Flow. SIGIR 06.