dais-sm.gif

Department of Computer Science

Unversity of Illinois at Urbana-Champaign

 

Home

People

Research

Seminars

Education

Photos

Links

 

Fall 2009 Reading List for the DAIS Qualifying Examination

I. Information Retrieval

  • Basic concepts
    • Vector-space retrieval model, TF-IDF weighting, relevance/pseudo feedback, query expansion, mean average precision (MAP), normalized discounted cumulative gain (NDCG), query-likelihood retrieval model, language model smoothing, PageRank, inverted index
  • Background
    • Amit Singhal, Modern Information Retrieval: A Brief Overview, IEEE Data Engineering Bulletin 24(4), pages 35-43, 2001.

Link: http://singhal.info/ieee2001.pdf

    • Chris Manning, Prabhakar Raghaven, Hinrich Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008. (Chapter 8 Evaluation in IR, Chapters 21 Link Analysis)

Link: http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

    • ChengXiang Zhai, Statistical Language Models for Information Retrieval, Morgan and Claypool Publishers, 2008. (Chapter 1 Introduction, Chapter 2 Overview of IR Models, and Chapter 3 Simple Query Likelihood Retrieval Model).

Link: http://www.morganclaypool.com/doi/abs/10.2200/S00158ED1V01Y200811HLT001

·      More advanced topics

o   J. Zhu, J. Wang, I. Cox, and M. Taylor, Risky business: modeling and exploiting uncertainty in information retrieval,
Proceedings of ACM SIGIR 2009, pp.99-106.


Link: http://doi.acm.org/10.1145/1571941.1571961

 

o   B. Carterette, On rank correlation and the distance between rankings,
Proceedings of ACM SIGIR 2009, pp. 436-443.


Link: http://doi.acm.org/10.1145/1571941.1572017

 

o   E. Diemert, G. Vandelle, Unsupervised Query Categorization using Automatically-Built Concept Graphs,
Proceedings of WWW 2009.


Link: http://www2009.org/proceedings/pdf/p461.pdf

 

o   J. Liu, Y. Cao, C-Y Lin, Y. Huang, M. Zhou, Low-Quality Product Review Detection in Opinion Summarization,
Proceedings of 2007 EMNLP/CoNLL, pp. 334-342.


Link: http://www.aclweb.org/anthology/D/D07/D07-1035.pdf

 

II. Data Mining and Data Warehousing

  • Basic Concepts
    • Data warehousing: star schema, data cube (be able to list half a dozen typical data cube computation methods), multi-dimensional analysis (OLAP)
    • Data mining: frequent pattern mining (be able to list half a dozen typical methods), sequential pattern mining (be able to list at four or five typical methods), correlation analysis, classification (be able to list half a dozen typical methods), clustering (be able to list half a dozen typical methods)
  • Background
    • J. Han and M. Kamber, Data Mining: Concepts and Techniques, 2nd edition. Chapters 3 & 4 (for data warehousing); Chapters 2, 5-7 (for data mining). Morgan Kaufmann 2006.
  • More advanced topics
    • Data Warehousing:
      1. Prediction cubes. Chen, Chen, Lin, and Ramakrishnan. VLDB 2005. [pdf]
      2. ARCube: Supporting Ranking Aggregate Queries in Partially Materialized Data Cubes. Wu, Xin, and Han. SIGMOD 2008. [pdf]
    • Data Mining:
      1. Y. Sun, et al., “Ranking-Based Clustering of Heterogeneous Information Networks with Star Network Schema", Proc. 2009 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'09), Paris, France, June 2009.
      2. SCAN: A Structural Clustering Algorithm for Networks. Xu et al.  KDD 2007. [acm]
      3. Direct Discriminative Pattern Mining for Effective Classification. Cheng, Yan, Han, and Yu. ICDE 2008. [pdf]

III. Database Management Systems

  • Basic concepts
    • Hardware: disk sector, track, block, seek, latency, how to lay out a database page
    • Data modeling: ER, OO, and Object-Relational approaches
    • Concurrency control and recovery: ACID, serializability, two-phase locking, two-phase commit, logging and recovery, the impact of data replication
    • Theory: normalization, dependencies
    • Queries: access methods (hashing, B-trees, multidimensional access methods), how to optimize a query, SQL
    • Benchmarks: TPC-C and TPC-H
  • Background
    You can use any database textbook you like to study the most basic of the concepts listed above; for example, CS411 teaches these concepts. (Note that you will be expected to be able to demonstrate your understanding of the concepts by applying them (as opposed to simply being able to define them).) In the remaining entries, "RDS" refers to Stonebraker's Readings in Database Systems.
    • Generalized Search Trees for Database Systems. Hellerstein et al.  VLDB 1995 and RDS. [pdf] We include this paper as the reference for multidimensional access methods; access methods based on B-trees and hashing should be covered in any database textbook.
    • New TPC Benchmarks for Decision Support and Web Commerce. Poess and Floyd. SIGMOD Record 29(4), December 2000. [pdf]
    • Inclusion of New Types in Relational Data Base Systems. Stonebraker. ICDE 1986 and RDS. [acm] We include this paper as your reference for understanding the impact of extensibility (as, for example, intended by the object-relational model) on a DBMS.
  • More advanced topics
    Please note that databases are a very broad field. The papers listed here will be changed frequently, to reflect this breadth.
    • Database Core
      • Scalable Approximate Query Processing with the DBO Engine. Jermaine, Arumugam, Pol, and Dobra.  SIGMOD 2007. [acm]
      • Compiling Mappings to Bridge Applications and Databases. Melnik, Adya, and Bernstein.  SIGMOD 2007. [acm]
    • Information Systems
      • Scalable Semantic Web Data Management Using Vertical Partitioning.  Abadi, Marcus, Madden, and Hollenbach.  VLDB 2007. [pdf]
      • iTrails: Pay-as-you-go Information Integration in Dataspaces. Salles et al.  VLDB 2007. [pdf]

IV. Bioinformatics

  • Basic Concepts
    • Sequence alignment
    • Motif finding and regulatory sequence analysis
    • Gene prediction
    • DNA sequencing
    • Phylogenetic tree reconstruction
    • Gene expression analysis
    • Clustering of biological data
  • Background
    • Biological sequence analysis---probabilistic models of proteins and nucleic acids, by Durbin, Eddy, Krogh, and Mitchison. Read Chapters 2 (Pairwise alignment), 3 (Markov chains and hidden Markov models), and 8.1-8.5 (Probabilistic approaches to phylogeny).
  • More advanced topics

o    Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, et al. 2009
Fast Statistical Alignment.
PLoS Comput Biol 5(5): e1000392. doi:10.1371/journal.pcbi.1000392
Link: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000392

 

o    Kundaje A, Xin X, Lan C, Lianoglou S, Zhou M, et al. 2008
A Predictive Model of the Oxygen and Heme Regulatory Network in Yeast.
PLoS Comput Biol 4(11): e1000224. doi:10.1371/journal.pcbi.1000224
Link: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000224

 

o    Edelman EJ, Guinney J, Chi J-T, Febbo PG, Mukherjee S, 2008
Modeling Cancer Progression via Pathway Dependencies.
PLoS Comput Biol 4(2): e28. doi:10.1371/journal.pcbi.0040028
Link: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0040028

 

 

 


DAIS - Database and Information Systems Laboratory, Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave., Urbana, IL 61801, USA.  Fax: 217-265-6494, Phone: 217-244-6241.