Yahoo!-DAIS Seminars
CS
Department Colloquia
Each semester, there are departmental colloquia of
interest to the DAIS community. Refer to the department seminar web pages and
the Distinguished
Lecturer/Entrepreneur Series web page for a complete listing of these
seminars, which will usually also be announced on the DAIS mailing list
described below.
The
Yahoo!-DAIS Seminar (CS591MSW)
The Yahoo!-DAIS Seminar will be held on Tuesdays at 4 PM
in 3403 SC. As in other semesters, we will have a few visiting speakers who
must be scheduled at a different day or time, due to their travel schedules.
Students who take the Yahoo!-DAIS Seminar for credit can miss up to two
seminars. Speakers are announced on the DAIS mailing list (as are other items
of interest to the DAIS community). It is quick and easy to subscribe to the DAIS
mailing list.
Seminar schedules for past semesters: Summer 2009 | Spring 2009 | Fall 2008 | Spring 2008 | Fall 2007 | Spring 2007 | Fall 2006 | Spring 2006 | Fall 2005 | Spring 2005 | Fall 2004
Fall
2009 Schedule
Coordinator: Lu-An Tang,
tang18 AT illinois.edu
|
Tuesday, 8/25/2009 SC
3403
4-5 PM
|
Title: Introduction to our DAIS group
Speaker: Prof. Marianne
Winslett
Abstract: Did you know that the DAIS group has changed its name? Get
caught up on this and other hot news in this introduction to the DAIS group
and their research areas. As time permits, I will also
give tips on how to give a good technical presentation.
Online Video: Click
Here
|
|
Tuesday, 9/1/2009 SC 3403
4-5 PM
|
Title: Ranking-Based Clustering of
Heterogeneous Information Networks with Star Network Schema
Speaker: Yizhou Sun
Abstract: A heterogeneous information network is an information
network composed of multiple types of objects. Clustering on such a network
may lead to better understanding of both hidden structures of the network
and the individual role played by every object in each cluster. However,
although clustering on homogeneous networks has been studied over decades,
clustering on heterogeneous networks has not been
addressed until recently.
A recent study proposed a new algorithm, RankClus, for clustering on
bi-typed heterogeneous networks. However, a real-world network may consist
of more than two types, and the interactions among multi-typed objects play
a key role at disclosing the rich semantics that a network carries.
In this paper, we study clustering of multi-typed heterogeneous networks
with a star network schema and propose a novel algorithm, NetClus, that
utilizes links across multityped objects to generate high-quality
net-clusters. An iterative enhancement method is developed that leads to
effective ranking-based clustering in such heterogeneous networks. Our
experiments on DBLP data show that NetClus generates more accurate
clustering results than the baseline topic model algorithm PLSA and the
recently proposed algorithm, RankClus. Further, NetClus generates
informative clusters, presenting good ranking and cluster membership information
for each attribute object in each net-cluster.
Online Video: Link
|
|
Tuesday, 9/8/2009
SC 3403
4-5 PM
|
Title: Web Scale Integration, Web Scale Inspiration
Speaker: Kevin
Chang
Online Video: Link
|
|
Tuesday, 9/15/2009
SC 3403
4-5 PM
|
Title: From Theory to
Practice: A Utility-based Methodology for Mining Object-relational
Databases
Speaker: Herna L. Viktor
Abstract: Massive object-relational databases,
which are omnipresent in domains such computational biology, law
enforcement and environmental impact studies, amongst others, bring new
challenges to the data mining community. There is an urgent need for novel
algorithms and solutions to assist domain experts and data mining novices
to understand these vast resources. Such users require direct, transparent
access to their large-scale object-relational databases. Furthermore, they
require the data mining exercise, and its results, to be evaluated using
measures that are of high economic utility, in order to make informed,
thought-through decisions.
This talk describes our ongoing work at the Intelligent Decision
Support and Analysis Lab (IDeAL), when mining such object-relational
databases. We detail our methodology, which includes (a) multi-view
learning, (b) cluster analysis, (c) indexing and similarity search, (d)
interestingness measures and (e) dimensionality reduction. The results, and
lessons learned, when applying these techniques to the CAESARTM
anthropometric database and the Protein Data Bank (PDB), are
discussed.
Short Biography: Herna L Viktor is an associate professor at the School of IT
and Engineering (SITE), University
of Ottawa, Canada
and the leader of the Intelligent Decision Support and Data Analysis Lab
(IDeAL) at SITE. Her research focuses on the
development of new methodologies for the management and data mining of
large-scale object-relational databases and data warehouses. The end
results of her research have been applied within the Anthropometry, Health
Care and Bioinformatics domains. She holds a Ph. D. in Computer Science from the University of Stellenbosch,
which she received in 1999, has published more that 90 international
journal and conference articles and is on a number of international
programme committees. Her research is sponsored by the Canadian National
Science and Engineering Research Council (NSERC), Canada Foundation for
Innovation (CFI) and the Ontario Network for Research in e-Commerce
(ORNEC).
Online Video: Link
|
|
Tuesday, 9/22/2009
SC 3403
4-5 PM
|
Speaker: Bertram Ludaescher
Title: Modeling and Design of Scientific
Workflows with Data Assembly Lines, Provenance, and Semantic Types
Abstract: Despite an increasing interest in scientific
workflow
technologies in recent years, workflow design remains a challenging, slow,
and often error-prone process, thus limiting the speed of further adoption
of scientific workflows. Based on practical experience with data-driven
workflows, we identify and illustrate a number of recurring scientific
workflow design challenges, i.e., parameter-rich functions; data assembly,
disassembly, and cohesion; conditional execution; iteration; and, more
generally, workflow evolution. In the second part of the talk, we discuss
related workflow research issues, i.e., the importance of provenance and
data lineage in scientific workflows and the use of logic-based semantic
types in workflow design.
See http://cirss.lis.illinois.edu/BertramLecture.html
for an extended
abstract.
¡¡
Short Biography: Bertram Ludäscher
is a Professor of Computer Science at the Department of Computer Science
and the Genome Center at the University
of California, Davis. Prior to joining UC Davis, he
worked at the San Diego Supercomputer Center at UC San Diego where until
2004 he was an Associate Research Scientist, leading the Knowledge-Based
Information Systems lab.
Dr. Ludäscher's primary research interests are in scientific data
management, in particular scientific data integration, scientific workflow
management, and knowledge-based (semantic) extensions thereof. He is also
interested in foundations of databases, e.g., query languages and query
rewriting. He received his M.S. in Computer Science (Dipl.-Inform.) from
the Technical University of Karlsruhe in 1992, and his Ph.D. (Dr.rer.nat.)
in Computer Science from the University
of Freiburg in 1998, both in Germany.
Online Video: Link
¡¡
|
|
Tuesday, 9/29/2009
SC 3403
4-5 PM
|
SPEAKER: Evgeniy
Gabrilovich
TITLE: The evolution of computational advertising: from heuristic ad matching
to knowledge-based ad retrieval
ABSTRACT
Online advertising is the primary
economic force behind numerous Internet services ranging from major Web
search engines to obscure forums. A new discipline - Computational
Advertising - has recently emerged, which studies the process of
advertising on the Internet from a variety of angles. A successful
advertising campaign should be integral to the user experience and relevant
to the users¡¯ information needs, as well as economically worthwhile to the
advertiser and the publisher. This talk will survey the evolution of online
advertising systems, and discuss the unique challenges posed by searching
the ad corpus. At first approximation, finding user-relevant ads can be
reduced to conventional information retrieval. However, the complex
structure of ad campaigns along with the cornucopia of pertinent
non-textual information makes ad retrieval substantially (and
interestingly) different. We juxtapose ad retrieval with Web search and
show how to adapt standard IR methods, in particular by augmenting the ad
selection process with external knowledge. We demonstrate how to enrich
query representation using Web search results, and thus use the Web as a
repository of relevant query-specific knowledge. We will discuss how
computational advertising benefits from research in many AI areas such as
machine learning, machine translation, and text summarization, and also
survey some of the new problems it poses.
BIO
Evgeniy Gabrilovich is a Senior
Research Scientist and Manager of the NLP & IR Group at Yahoo!
Research. His research interests include information retrieval, machine
learning, and computational linguistics. Recently, he organized a workshop
on the synergy between user-contributed knowledge and research in AI at IJCAI¡¯09,
and a workshop on information retrieval for advertising at SIGIR¡¯09.
Evgeniy presented tutorials on computational advertising at IJCAI¡¯09,
ACL¡¯08, and EC¡¯08. He served on the program committees of WWW, WSDM,
SIGIR, CIKM, AAAI, ACL, EMNLP, HLT, COLING, and JCDL. Evgeniy earned his
MSc and PhD degrees in Computer Science from the Technion - Israel
Institute of Technology. In his Ph.D. thesis, he developed a methodology
for using large scale repositories of world knowledge (e.g., all the
knowledge available in Wikipedia) to enhance text representation beyond the
bag of words.
URL: http://research.yahoo.com/~gabr
Online
Video: Link
|
|
Tuesday, 10/6/2009
SC 3403
4-5 PM
|
Title: CarWeb: Sharing GPS Data for Traffic Estimation and
Trajectory Pattern Mining
Speaker: Chris
(Wen-Chih) Peng
Abstract: Traffic
information of road networks is important in our daily lives, which has
attracted a significant amount of research efforts on measuring and
predicting traffic status. Explicitly, by exploring traffic information,
several studies have elaborated on determining navigation paths in which
the traveling time of these navigation paths is minimized. To obtain
traffic status, one existing method is to deploy sensors along with roads.
However, deploying sensors on all roads is costly, which is not practical
for large-scale road networks. In this talk, I will present CarWeb
platform, which is implemented for sharing GPS data for traffic estimation.
By exploring spatial-temporal features of traffic, we propose some
algorithms to estimate traffic status. Furthermore, with these GPS data
points, we further develop algorithms to discover trajectory patterns.
Explicitly, I will briefly introduce how to discover trajectory patterns
from trajectory data that has random sample features. Moreover, how to represent
a trajectory profile of users for discovering community structures is
presented.
¡¡
Short biography:
Wen-Chih Peng was born in Hsinchu,
Taiwan,
R.O.C in 1973. He received the BS and MS degrees from the National Chiao
Tung University, Taiwan, in 1995 and 1997, respectively, and the Ph.D.
degree in Electrical Engineering from the National Taiwan University,
Taiwan, R.O.C in 2001. Currently, he is an assistant professor at the
department of computer science, National Chiao Tung University, Taiwan.
Prior to joining the department of Computer Science, National Chiao
Tung University,
he was mainly involved in the projects related to mobile computing, data
broadcasting and network data management. Dr. Peng serves as PC members in
several prestigious conferences, such as IEEE International Conference on
Data Engineering (ICDE), Pacific Asia Knowledge Discovering and Mining
(PAKDD) and Mobile Data Management (MDM). Dr. Peng is a co-organizer of 2nd
International Workshop on Privacy-Aware Location-based Mobile Services
(PALMS) and is a guest editor of Signal Processing (special issue on
Information Processing and Data Management in Wireless Sensor Networks).
His research interests include mobile data management and data mining. He
is a member of IEEE.
Online Video: Link
¡¡
|
|
Tuesday, 10/13/2009
SC 3403
4-5 PM
|
Title: PARALLEL COORDINATES: VISIAL
MULTIDIMENSIONAL GEOMETRY AND
ITS APPLICATIONS
Speaker: Alfred Inselberg
Abstract: With parallel
coordinates the perceptual barrier imposed by our 3-dimensional habitation
is breached enabling the visualization of multidimensional problems. The
highlights, interlaced with interactive demonstrations, are intuitively
developed. By learning to recognize patterns a powerful knowledge discovery
process evolved. It lead to a deeper geometrical insight: the recognition
of M-dimensional objects recursively from their (M −1)-dimensional
subsets. It emerges that a hyperplane in N-dimensions is represented by (N −1)
indexed points. Points representing lines have two indices, those
representing planes three indices and so on. In turn, this yields powerful
geometrical algorithms (e.g. for intersections, containment, proximities)
and applications including classification.
A smooth surface in 3-D is the envelope of its tangent planes each
represented by 2 planar points. As a result it is represented by two planar
regions, and a hypersurface in N-dimensions by (N −1) regions. This
is equivalent to representing a surface by its normal vectors. Developable
surfaces are represented by curves revealing the surface characteristics.
Convex surfaces in any dimension are recognized by hyperbola-like regions.
Non-orientable surfaces yield stunning patterns unlocking new geometrical
insights.
Non-convexities like folds, bumps, concavities are not hidden. The patterns
persist in the presence of errors and that¡¯s good news for applications
opening the way for the exploration of massive datasets. Applications of
parallel coordinates include collision avoidance and conflict resolution
algorithms for air traffic control (3 USA
patents), computer vision (USA
patent), data mining (USA
patent) for data exploration and automatic classification, optimization,
decision support and process control.
¡¡
Bio: Alfred Inselberg
(AI) received a Ph. D. in Mathematics and Physics from the University of Illinois (Champaign-Urbana) in 1965
and was Research Assist. Professor until 1966. He held research positions
at IBM, where he developed a Mathematical Model of Ear (TIME Nov. 74),
concurrently having joint appointments at UCLA, USC, Technion and Ben Gurion University.
He is at the School of Mathematical Sciences at Tel Aviv
University since
1995. He was elected Senior Fellow at the San
Diego Supercomputing
Center in 1996 and Distinguished
Visiting Professor at Korea University in Seoul in 2008. AI invented and developed
the multidimensional system of Parallel Coordinates for which he received
numerous awards and patents (on Air Traffic Control, Collision-Avoidance,
Computer Vision, Data Mining). His textbook on "VISUAL
Multidimensional Geometry" is being released by Springer in 2009.
Online Video: Link
¡¡
|
|
Tuesday, 10/20/2009
SC 3403
4-5 PM
|
Title: Two Body Job Searches
Speaker: Prof.
Marianne Winslett
Abstract:
If you and your significant other (SO) are both going
to be looking for jobs at the same time, you
may face challenges not encountered in single-body job searches. If you
must also consider the needs of children or other relatives, the task
becomes even more complex. Of course, if you are lucky, your SO is seeking
the type of job that can be found almost anywhere; maybe he is a
kindergarten teacher, a physician specializing in internal medicine, or an
Oracle DBA. In this article, we describe some of the complications that you
may experience if you are not so lucky: you have a PhD in computer science
and your SO is in a hard-to-place job category, and the two of you would
like to live in the same metropolitan area. We will discuss the questions
of how wide a net to cast in sending out resumes, whether to mention your
two-body situation up front, timing considerations, interviewing together
or separately, cancelling interviews, the negotiation stage. We will
consider both academic and non-academic positions, with special attention
to some of the trickier points of academic job searches. In addressing
these questions, we will draw on our combined personal experience of
two-body job hunts as recent as 2003 and as long ago as 1987, along with
information gleaned from the two-body job searches of friends and
colleagues.
Online Video:
|
|
Tuesday, 10/27/2009
SC 3403
4-5 PM
|
Title: Rated Aspect Summarization of Short Comments
Speaker: Yue
Lu
Abstract:
Web 2.0 technologies have enabled more and more people to freely comment on
different kinds of entities (e.g. sellers, products, services). The large
scale of information poses the need and challenge of automatic
summarization. In many cases, each of the user-generated short comments
comes with an overall rating. In this paper, we study the problem of gen-
erating a "rated aspect summary" of short comments, which is a
decomposed view of the overall ratings for the major as- pects so that a
user could gain different perspectives towards the target entity. We
formally define the problem and de- compose the solution into three steps.
We demonstrate the effectiveness of our methods by using eBay sellers'
feedback comments. We also quantitatively evaluate each step of our methods
and study how well human agree on such a summa- rization task. The proposed
methods are quite general and can be used to generate rated aspect summary
automati- cally given any collection of short comments each associated with
an overall rating.
¡¡
¡¡
|
|
Tuesday, 11/3/2009
SC 3403
4-5 PM
|
Title: Bipartite
Graph-based Consensus Maximization among Supervised and Unsupervised Models
Speaker: Jing Gao
Abstract: Ensemble, which
combines multiple diversified single models, has emerged as a powerful
method for improving the robustness as well as the accuracy of both
supervised and unsupervised solutions. The ensemble techniques have been
mostly studied in supervised and unsupervised learning communities
separately, but they share the same basic principles, i.e., combination of
diversified base models strengthens weak models. Also, when both supervised
and unsupervised models are available for a single task, merging all of the
results leads to better performances.
In this talk, I will first present an organized picture on ensemble methods
combining the views of both supervised and unsupervised methods. Then I'll
present our recent work (to appear in NIPS'09) on combining outputs from
multiple supervised and unsupervised models on a set of objects for better
label predictions. We aim at calculating a consolidated classification
solution by maximizing the consensus among all the available models. We
seek a global optimal label assignment for the target objects, which is
different from the result of traditional majority voting and model
combination approaches. We cast the problem into an optimization problem on
a bipartite graph, where the objective function favors smoothness in the conditional
probability estimates over the graph, as well as penalizes deviation
from initial labeling of supervised models. We solve the problem through
iterative propagation of conditional probability estimates among
neighboring nodes, and interpret the method as conducting a constrained
embedding in a transformed space, as well as a ranking on the graph.
Experimental results on three real applications demonstrate the benefits of
the proposed method over existing alternatives.
¡¡
Online Video:
|
|
Tuesday, 11/10/2009
SC 3403
4-5 PM
|
Title: Modeling the Searcher's Irreversible Learning:
Quantum-like States
Speaker: Prof. Paul Kantor
ABSTRACT:
Many evaluations of search systems use a simple binary model of relevance,
and do not address the fact that the user is intelligent, and retains knowledge of what has been
seen before. This can be described by saying that the user is 'irreversibly
changed' by the experiences of the search process. Measurement processes
are modeled, in Quantum Mechanics, by an irreversible change in the density
matrix, which contains the complete description of the state.
We propose a way to adapt this language to describe the change in the state
of the searcher, as more information is absorbed. This results in a
'collapse of the user's wave function' analogous to the collapse of the
wave function. We describe the basic concepts, show how they can be applied
to account for the diminished value of redundant information, and discuss
some open problems in the
representation of set-based valuations.
Brief BIO.
Paul Kantor's research centers on the role of information systems for storage
and retrieval in a wide range of applications, with particular emphasis on
rigorous evaluation of the effectiveness of such systems.
At Rutgers he is a member of the Department of Library and Information Science,
the Center for Operations Research (RUTCOR), the Center for Discrete
Mathematics and Computer Sciences (DIMACS) and the Graduate Faculty the
Department of Computer Science. He is a member of the American Society for
Information Science and Technology (ASIST), the American Association for
the Advancement of Science (AAAS), the IEEE, the American Physical Society, and the
American Statistical Association. His research has been supported by such
agencies as the NSF, DARPA, ARDA, the NGA, the DHS, and the US Department
of Education. He was educated in Physics and Mathematics at Columbia and Princeton,
has received the ASIST Research award, and is a Fellow of
the AAAS. Biographical listings: Who's Who in America; Who's Who in the
World.
|
|
Tuesday, 11/17/2009
SC 3403
4-5 PM
|
Title: Safely Analyzing Sensitive Network Data
Speaker: Prof. Gerome
Miklau
Social and communication networks are formed by entities (such as
individuals or computer hosts) and their connections (which may be contacts,
relationships, or flows of information). Such networks are analyzed to
understand the influence of individuals in organizations, the transmission
of disease in communities, the operation of computer networks, among many
other topics. While network data can now be recorded at unprecedented
scale, releasing it can result in unacceptable disclosures about
participants and their relationships. As a result, privacy concerns are
severely constraining the dissemination of network data and disrupting the
emerging field of network science.
Our recent work investigates the properties of a network that can be
accurately studied without threatening the privacy of individuals and their
connections. We adopt the rigorous condition of differential privacy, and
develop algorithms for releasing randomly perturbed statistics about the
topology of a sensitive network. This talk will focus on two basic analysis
tasks: the estimation of the degree distribution of a network and the study
of small structural patterns that occur in a network (sometimes called motif
analysis). We show that the degree distribution of a network can be very
accurately estimated by a novel technique in which constraints are applied
to the noisy output to improve utility. This technique is of general
interest, and can be used to boost the accuracy of differentially private
output in other tasks as well. We show that studying motifs is
fundamentally harder, but can be done with acceptable accuracy if the
privacy condition is relaxed.
Bio:
Gerome Miklau is an Assistant Professor at the University of Massachusetts,
Amherst. His primary research interest is the secure management of
large-scale data. This includes evaluating threats to privacy in published
data, devising techniques for the safe publication of social networks,
network traces, and audit logs, designing database management systems to
implement security policies, and theoretically analyzing information
disclosure. He received an NSF CAREER Award in 2007 and won the 2006 ACM
SIGMOD Dissertation Award. He received his Ph.D. in Computer Science from
the University of Washington in 2005. He earned Bachelor's degrees in
Mathematics and in Rhetoric from the University of California, Berkeley, in
1995.
|
|
Tuesday, 11/24/2009
SC 3403
4-5 PM
|
Thanks Giving Break
|
|
Tuesday, 12/1/2009
SC 3403
4-5 PM
|
Hector Garcia-Molina's
visit and talk (Student Meeting)
|
|
Tuesday, 12/8/2009
SC 3403
4-5 PM
|
Yuan Junsong’s lecture.
|
|