Department of Computer Science

Unversity of Illinois at Urbana-Champaign

 

Home

People

Research

Seminars

Education

Photos

Links

 

Yahoo!-DAIS Seminars

CS Department Colloquia

Each semester, there are departmental colloquia of interest to the DAIS community. Refer to the department seminar web pages and the Distinguished Lecturer/Entrepreneur Series web page for a complete listing of these seminars, which will usually also be announced on the DAIS mailing list described below.

The Yahoo!-DAIS Seminar (CS591MSW)

The Yahoo!-DAIS Seminar will be held on Tuesdays at 4 PM in 3403 SC. As in other semesters, we will have a few visiting speakers who must be scheduled at a different day or time, due to their travel schedules. Students who take the Yahoo!-DAIS Seminar for credit can miss up to two seminars. Speakers are announced on the DAIS mailing list (as are other items of interest to the DAIS community). It is quick and easy to subscribe to the DAIS mailing list.

Seminar schedules for past semesters: Summer 2009 | Spring 2009 | Fall 2008 | Spring 2008 | Fall 2007 | Spring 2007 | Fall 2006 | Spring 2006 | Fall 2005 | Spring 2005 | Fall 2004

Fall 2009 Schedule
Coordinator: Lu-An Tang, tang18 AT illinois.edu

Tuesday, 8/25/2009  SC 3403
4-5 PM

Title: Introduction to our DAIS group
Speaker: Prof. Marianne Winslett
Abstract: Did you know that the DAIS group has changed its name? Get caught up on this and other hot news in this introduction to the DAIS group and their research areas. As time permits, I will also
give tips on how to give a good technical presentation.
Online Video: Click Here

Tuesday, 9/1/2009    SC 3403
4-5 PM

Title: Ranking-Based Clustering of Heterogeneous Information Networks with Star Network Schema 
Speaker: Yizhou Sun

Abstract: A heterogeneous information network is an information network composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on homogeneous networks has been studied over decades, clustering on heterogeneous networks has not been
addressed until recently.
A recent study proposed a new algorithm, RankClus, for clustering on bi-typed heterogeneous networks. However, a real-world network may consist of more than two types, and the interactions among multi-typed objects play a key role at disclosing the rich semantics that a network carries.
In this paper, we study clustering of multi-typed heterogeneous networks with a star network schema and propose a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters. An iterative enhancement method is developed that leads to effective ranking-based clustering in such heterogeneous networks. Our experiments on DBLP data show that NetClus generates more accurate clustering results than the baseline topic model algorithm PLSA and the recently proposed algorithm, RankClus. Further, NetClus generates informative clusters, presenting good ranking and cluster membership information for each attribute object in each net-cluster.
Online Video: Link

Tuesday, 9/8/2009
SC 3403
4-5 PM

Title: Web Scale Integration, Web Scale Inspiration  
Speaker: Kevin Chang

Online Video: Link

Tuesday, 9/15/2009
SC 3403
4-5 PM

Title: From Theory to Practice: A Utility-based Methodology for Mining Object-relational Databases

Speaker: Herna L. Viktor 

Abstract: Massive object-relational databases, which are omnipresent in domains such computational biology, law enforcement and environmental impact studies, amongst others, bring new challenges to the data mining community. There is an urgent need for novel algorithms and solutions to assist domain experts and data mining novices to understand these vast resources. Such users require direct, transparent access to their large-scale object-relational databases. Furthermore, they require the data mining exercise, and its results, to be evaluated using measures that are of high economic utility, in order to make informed, thought-through decisions.   

This talk describes our ongoing work at the Intelligent Decision Support and Analysis Lab (IDeAL), when mining such object-relational databases. We detail our methodology, which includes (a) multi-view learning, (b) cluster analysis, (c) indexing and similarity search, (d) interestingness measures and (e) dimensionality reduction. The results, and lessons learned, when applying these techniques to the CAESARTM anthropometric database and the Protein Data Bank (PDB), are discussed. 

Short Biography: Herna L Viktor is an associate professor at the School of IT and Engineering (SITE), University of Ottawa, Canada and the leader of the Intelligent Decision Support and Data Analysis Lab (IDeAL) at SITE. Her research focuses on the development of new methodologies for the management and data mining of large-scale object-relational databases and data warehouses. The end results of her research have been applied within the Anthropometry, Health Care and Bioinformatics domains. She holds a Ph. D. in Computer Science from the University of Stellenbosch, which she received in 1999, has published more that 90 international journal and conference articles and is on a number of international programme committees. Her research is sponsored by the Canadian National Science and Engineering Research Council (NSERC), Canada Foundation for Innovation (CFI) and the Ontario Network for Research in e-Commerce (ORNEC).

Online Video: Link

Tuesday, 9/22/2009
SC 3403
4-5 PM

Speaker: Bertram Ludaescher

Title:  Modeling and Design of Scientific Workflows with Data Assembly Lines, Provenance, and Semantic Types

Abstract:   Despite an increasing interest in scientific workflow
technologies in recent years, workflow design remains a challenging, slow,
and often error-prone process, thus limiting the speed of further adoption
of scientific workflows. Based on practical experience with data-driven
workflows, we identify and illustrate a number of recurring scientific
workflow design challenges, i.e., parameter-rich functions; data assembly,
disassembly, and cohesion; conditional execution; iteration; and, more
generally, workflow evolution. In the second part of the talk, we discuss
related workflow research issues, i.e., the importance of provenance and
data lineage in scientific workflows and the use of logic-based semantic
types in workflow design.

See http://cirss.lis.illinois.edu/BertramLecture.html  for an extended
abstract.
¡¡

Short Biography: Bertram Ludäscher is a Professor of Computer Science at the Department of Computer Science and the Genome Center at the University of California, Davis. Prior to joining UC Davis, he worked at the San Diego Supercomputer Center at UC San Diego where until 2004 he was an Associate Research Scientist, leading the Knowledge-Based Information Systems lab.
Dr. Ludäscher's primary research interests are in scientific data management, in particular scientific data integration, scientific workflow management, and knowledge-based (semantic) extensions thereof. He is also interested in foundations of databases, e.g., query languages and query rewriting. He received his M.S. in Computer Science (Dipl.-Inform.) from the Technical University of Karlsruhe in 1992, and his Ph.D. (Dr.rer.nat.) in Computer Science from the University of Freiburg in 1998, both in Germany.

Online Video: Link
¡¡

Tuesday, 9/29/2009
SC 3403
4-5 PM

SPEAKER: Evgeniy Gabrilovich

TITLE: The evolution of computational advertising: from heuristic ad matching to knowledge-based ad retrieval

ABSTRACT

Online advertising is the primary economic force behind numerous Internet services ranging from major Web search engines to obscure forums. A new discipline - Computational Advertising - has recently emerged, which studies the process of advertising on the Internet from a variety of angles. A successful advertising campaign should be integral to the user experience and relevant to the users¡¯ information needs, as well as economically worthwhile to the advertiser and the publisher. This talk will survey the evolution of online advertising systems, and discuss the unique challenges posed by searching the ad corpus. At first approximation, finding user-relevant ads can be reduced to conventional information retrieval. However, the complex structure of ad campaigns along with the cornucopia of pertinent non-textual information makes ad retrieval substantially (and interestingly) different. We juxtapose ad retrieval with Web search and show how to adapt standard IR methods, in particular by augmenting the ad selection process with external knowledge. We demonstrate how to enrich query representation using Web search results, and thus use the Web as a repository of relevant query-specific knowledge. We will discuss how computational advertising benefits from research in many AI areas such as machine learning, machine translation, and text summarization, and also survey some of the new problems it poses.

BIO

Evgeniy Gabrilovich is a Senior Research Scientist and Manager of the NLP & IR Group at Yahoo! Research. His research interests include information retrieval, machine learning, and computational linguistics. Recently, he organized a workshop on the synergy between user-contributed knowledge and research in AI at IJCAI¡¯09, and a workshop on information retrieval for advertising at SIGIR¡¯09. Evgeniy presented tutorials on computational advertising at IJCAI¡¯09, ACL¡¯08, and EC¡¯08. He served on the program committees of WWW, WSDM, SIGIR, CIKM, AAAI, ACL, EMNLP, HLT, COLING, and JCDL. Evgeniy earned his MSc and PhD degrees in Computer Science from the Technion - Israel Institute of Technology. In his Ph.D. thesis, he developed a methodology for using large scale repositories of world knowledge (e.g., all the knowledge available in Wikipedia) to enhance text representation beyond the bag of words. 

URL: http://research.yahoo.com/~gabr

Online Video: Link

Tuesday, 10/6/2009
SC 3403
4-5 PM

Title: CarWeb: Sharing GPS Data for Traffic Estimation and Trajectory Pattern Mining

Speaker: Chris (Wen-Chih) Peng

Abstract: Traffic information of road networks is important in our daily lives, which has attracted a significant amount of research efforts on measuring and predicting traffic status. Explicitly, by exploring traffic information, several studies have elaborated on determining navigation paths in which the traveling time of these navigation paths is minimized. To obtain traffic status, one existing method is to deploy sensors along with roads. However, deploying sensors on all roads is costly, which is not practical for large-scale road networks. In this talk, I will present CarWeb platform, which is implemented for sharing GPS data for traffic estimation. By exploring spatial-temporal features of traffic, we propose some algorithms to estimate traffic status. Furthermore, with these GPS data points, we further develop algorithms to discover trajectory patterns. Explicitly, I will briefly introduce how to discover trajectory patterns from trajectory data that has random sample features. Moreover, how to represent a trajectory profile of users for discovering community structures is presented.  
¡¡

Short biography: Wen-Chih Peng was born in Hsinchu, Taiwan, R.O.C in 1973. He received the BS and MS degrees from the National Chiao Tung University, Taiwan, in 1995 and 1997, respectively, and the Ph.D. degree in Electrical Engineering from the National Taiwan University, Taiwan, R.O.C in 2001. Currently, he is an assistant professor at the department of computer science, National Chiao Tung University, Taiwan. Prior to joining the department of Computer Science, National Chiao Tung University, he was mainly involved in the projects related to mobile computing, data broadcasting and network data management. Dr. Peng serves as PC members in several prestigious conferences, such as IEEE International Conference on Data Engineering (ICDE), Pacific Asia Knowledge Discovering and Mining (PAKDD) and Mobile Data Management (MDM). Dr. Peng is a co-organizer of 2nd International Workshop on Privacy-Aware Location-based Mobile Services (PALMS) and is a guest editor of Signal Processing (special issue on Information Processing and Data Management in Wireless Sensor Networks). His research interests include mobile data management and data mining. He is a member of IEEE.

Online Video: Link

¡¡

Tuesday, 10/13/2009
SC 3403
4-5 PM

Title: PARALLEL COORDINATES: VISIAL MULTIDIMENSIONAL GEOMETRY AND
ITS APPLICATIONS

Speaker: Alfred Inselberg

Abstract: With parallel coordinates the perceptual barrier imposed by our 3-dimensional habitation is breached enabling the visualization of multidimensional problems. The highlights, interlaced with interactive demonstrations, are intuitively developed. By learning to recognize patterns a powerful knowledge discovery process evolved. It lead to a deeper geometrical insight: the recognition of M-dimensional objects recursively from their (M −1)-dimensional subsets. It emerges that a hyperplane in N-dimensions is represented by (N −1) indexed points. Points representing lines have two indices, those representing planes three indices and so on. In turn, this yields powerful geometrical algorithms (e.g. for intersections, containment, proximities) and applications including classification.
A smooth surface in 3-D is the envelope of its tangent planes each represented by 2 planar points. As a result it is represented by two planar regions, and a hypersurface in N-dimensions by (N −1) regions. This is equivalent to representing a surface by its normal vectors. Developable surfaces are represented by curves revealing the surface characteristics. Convex surfaces in any dimension are recognized by hyperbola-like regions. Non-orientable surfaces yield stunning patterns unlocking new geometrical insights.
Non-convexities like folds, bumps, concavities are not hidden. The patterns persist in the presence of errors and that¡¯s good news for applications opening the way for the exploration of massive datasets. Applications of parallel coordinates include collision avoidance and conflict resolution algorithms for air traffic control (3 USA patents), computer vision (USA patent), data mining (USA patent) for data exploration and automatic classification, optimization, decision support and process control.
¡¡

Bio: Alfred Inselberg (AI) received a Ph. D. in Mathematics and Physics from the University of Illinois (Champaign-Urbana) in 1965 and was Research Assist. Professor until 1966. He held research positions at IBM, where he developed a Mathematical Model of Ear (TIME Nov. 74), concurrently having joint appointments at UCLA, USC, Technion and Ben Gurion University. He is at the School of Mathematical Sciences at Tel Aviv University since 1995. He was elected Senior Fellow at the San Diego Supercomputing Center in 1996 and Distinguished Visiting Professor at Korea University in Seoul in 2008. AI invented and developed the multidimensional system of Parallel Coordinates for which he received numerous awards and patents (on Air Traffic Control, Collision-Avoidance, Computer Vision, Data Mining). His textbook on "VISUAL Multidimensional Geometry" is being released by Springer in 2009.

Online Video: Link
¡¡

Tuesday, 10/20/2009
SC 3403
4-5 PM

Title: Two Body Job Searches

Speaker: Prof. Marianne Winslett

Abstract:

If you and your significant other (SO) are both going to be looking for jobs at the same time, you
may face challenges not encountered in single-body job searches. If you must also consider the needs of children or other relatives, the task becomes even more complex. Of course, if you are lucky, your SO is seeking the type of job that can be found almost anywhere; maybe he is a kindergarten teacher, a physician specializing in internal medicine, or an Oracle DBA. In this article, we describe some of the complications that you may experience if you are not so lucky: you have a PhD in computer science and your SO is in a hard-to-place job category, and the two of you would like to live in the same metropolitan area. We will discuss the questions of how wide a net to cast in sending out resumes, whether to mention your two-body situation up front, timing considerations, interviewing together or separately, cancelling interviews, the negotiation stage. We will consider both academic and non-academic positions, with special attention to some of the trickier points of academic job searches. In addressing these questions, we will draw on our combined personal experience of two-body job hunts as recent as 2003 and as long ago as 1987, along with information gleaned from the two-body job searches of friends and colleagues.

Online Video:

Tuesday, 10/27/2009
SC 3403
4-5 PM

Title: Rated Aspect Summarization of Short Comments

Speaker: Yue Lu

Abstract:
Web 2.0 technologies have enabled more and more people to freely comment on different kinds of entities (e.g. sellers, products, services). The large scale of information poses the need and challenge of automatic summarization. In many cases, each of the user-generated short comments comes with an overall rating. In this paper, we study the problem of gen- erating a "rated aspect summary" of short comments, which is a decomposed view of the overall ratings for the major as- pects so that a user could gain different perspectives towards the target entity. We formally define the problem and de- compose the solution into three steps. We demonstrate the effectiveness of our methods by using eBay sellers' feedback comments. We also quantitatively evaluate each step of our methods and study how well human agree on such a summa- rization task. The proposed methods are quite general and can be used to generate rated aspect summary automati- cally given any collection of short comments each associated with an overall rating.
¡¡

¡¡

Tuesday, 11/3/2009
SC 3403
4-5 PM

Title: Bipartite Graph-based Consensus Maximization among Supervised and Unsupervised Models

Speaker: Jing Gao

Abstract: Ensemble, which combines multiple diversified single models, has emerged as a powerful method for improving the robustness as well as the accuracy of both supervised and unsupervised solutions. The ensemble techniques have been mostly studied in supervised and unsupervised learning communities separately, but they share the same basic principles, i.e., combination of diversified base models strengthens weak models. Also, when both supervised and unsupervised models are available for a single task, merging all of the results leads to better performances.
In this talk, I will first present an organized picture on ensemble methods combining the views of both supervised and unsupervised methods. Then I'll present our recent work (to appear in NIPS'09) on combining outputs from multiple supervised and unsupervised models on a set of objects for better label predictions. We aim at calculating a consolidated classification solution by maximizing the consensus among all the available models. We seek a global optimal label assignment for the target objects, which is different from the result of traditional majority voting and model combination approaches. We cast the problem into an optimization problem on a bipartite graph, where the objective function favors smoothness in the conditional probability estimates over the graph, as  well as penalizes deviation from initial labeling of supervised models. We solve the problem through iterative  propagation of conditional probability estimates among neighboring nodes, and interpret the method as conducting a constrained embedding in a transformed space, as well as a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives.
¡¡

Online Video:

Tuesday, 11/10/2009
SC 3403
4-5 PM

 Title: Modeling the Searcher's Irreversible Learning: Quantum-like  States

Speaker: Prof. Paul Kantor

ABSTRACT:
Many evaluations of search systems use a simple binary model of relevance, and do not address the fact that the user is intelligent,  and retains knowledge of what has been seen before. This can be described by saying that the user is 'irreversibly changed' by the experiences of the search process. Measurement processes are modeled, in Quantum Mechanics, by an irreversible change in the density matrix, which contains the complete description of the state.
We propose a way to adapt this language to describe the change in the state of the searcher, as more information is absorbed. This results in a 'collapse of the user's wave function' analogous to the collapse of the wave function. We describe the basic concepts, show how they can be applied to account for the diminished value of redundant information, and discuss some open problems in the
representation of set-based valuations.

 

Brief BIO.
Paul Kantor's research centers on the role of information systems for storage and retrieval in a wide range of applications, with particular emphasis on rigorous evaluation of the effectiveness of such systems.
At Rutgers he is a member of the Department of Library and Information Science, the Center for Operations Research (RUTCOR), the Center for Discrete Mathematics and Computer Sciences (DIMACS) and the Graduate Faculty the Department of Computer Science. He is a member of the American Society for Information Science and Technology (ASIST), the American Association for the Advancement of Science (AAAS), the IEEE,  the American Physical Society, and the American Statistical Association. His research has been supported by such agencies as the NSF, DARPA, ARDA, the NGA, the DHS, and the US Department of Education. He was educated in Physics and Mathematics at Columbia and Princeton, has received the ASIST Research award, and is a Fellow of
the AAAS. Biographical listings: Who's Who in America; Who's Who in the World.

 

Tuesday, 11/17/2009
SC 3403
4-5 PM

Title: Safely Analyzing Sensitive Network Data

Speaker: Prof. Gerome Miklau

Social and communication networks are formed by entities (such as
individuals or computer hosts) and their connections (which may be contacts,
relationships, or flows of information). Such networks are analyzed to
understand the influence of individuals in organizations, the transmission
of disease in communities, the operation of computer networks, among many
other topics. While network data can now be recorded at unprecedented
scale, releasing it can result in unacceptable disclosures about
participants and their relationships. As a result, privacy concerns are
severely constraining the dissemination of network data and disrupting the
emerging field of network science.

Our recent work investigates the properties of a network that can be
accurately studied without threatening the privacy of individuals and their
connections. We adopt the rigorous condition of differential privacy, and
develop algorithms for releasing randomly perturbed statistics about the
topology of a sensitive network. This talk will focus on two basic analysis
tasks: the estimation of the degree distribution of a network and the study
of small structural patterns that occur in a network (sometimes called motif
analysis). We show that the degree distribution of a network can be very
accurately estimated by a novel technique in which constraints are applied
to the noisy output to improve utility. This technique is of general
interest, and can be used to boost the accuracy of differentially private
output in other tasks as well. We show that studying motifs is
fundamentally harder, but can be done with acceptable accuracy if the
privacy condition is relaxed.


Bio:

Gerome Miklau is an Assistant Professor at the University of Massachusetts,
Amherst. His primary research interest is the secure management of
large-scale data. This includes evaluating threats to privacy in published
data, devising techniques for the safe publication of social networks,
network traces, and audit logs, designing database management systems to
implement security policies, and theoretically analyzing information
disclosure. He received an NSF CAREER Award in 2007 and won the 2006 ACM
SIGMOD Dissertation Award. He received his Ph.D. in Computer Science from
the University of Washington in 2005. He earned Bachelor's degrees in
Mathematics and in Rhetoric from the University of California, Berkeley, in
1995.

Tuesday, 11/24/2009
SC 3403
4-5 PM

Thanks Giving Break

Tuesday, 12/1/2009
SC 3403
4-5 PM

Hector Garcia-Molina's visit and talk (Student Meeting)

Tuesday, 12/8/2009
SC 3403
4-5 PM

Yuan Junsong’s lecture.