![]() |
|
|
| Home | People | Research | Seminars | Education | Photos | Links |
SeminarsCS Department ColloquiaEach semester, there are departmental colloquia of interest to the DAIS community. Refer to the department seminar web pages and the Distinguished Lecturer/Entrepreneur Series web page for a complete listing of these seminars, which will usually also be announced on the DAIS mailing list described below.
The Yahoo!-DAIS Seminar (CS591MSW)The Yahoo!-DAIS Seminar will be held on Tuesdays at 4 PM in 3403 SC. As in other semesters, we will have a few visiting speakers who must be scheduled at a different day or time, due to their travel schedules. Students who take the DAIS Seminar for credit can miss up to two seminars. Speakers are announced on the DAIS mailing list (as are other items of interest to the DAIS community). It is quick and easy to subscribe to the DAIS mailing list. Seminar schedules for past semesters: Fall 2007 | Spring 2007 | Fall 2006 | Spring 2006 | Fall 2005 | Spring 2005 | Fall 2004 Spring 2008 Schedule
|
| Wednesday, January 16 SC 3403 4 PM |
Title: Security and Integrity in Outsourcing of Data Mining
Speaker: Prof. David Cheung Abstract: Outsourcing of data mining to an outside service provider brings important benefits to the data owner. These include (i) relief from the high mining cost, (ii) minimization of demands in resources, and (iii) effective centralized mining for multiple distributed owners. However, security and integrity are issues that must be tackled before enterprises can indeed outsource data mining task. The service provider should be prevented from accessing the actual data (security), and the results returned to the owner must be authentic (integrity). In this talk, we will first describe a model on the security and integrity problems in the outsourcing of data mining to a third party service provider. A recent result on secure association rules mining will be used to explain the outsourcing model and to illustrate the feasibility of an approach we used. In protecting the security in mining association rules, a substitution cipher technique was proposed in the encryption of transactional data. After identifying the non-trivial threats to a straightforward one-to-one item mapping substitution cipher, we propose a novel secure encryption algorithm based on a one-to-n item mapping that transforms transactions non-deterministically, yet guarantees correct decryption. We will also discuss the integrity problem in the same outsourcing model for association mining. Bio: Professor David Wai-lok Cheung is Head of Department of Computer Science and Director of the Center for E-commerce Infrastructure Development (CECID) in The University of Hong Kong. He is an active researcher in database, data mining and e-commerce technologies. His recent research covers security and integrity in outsourcing of data mining, data interoperability theory and xml schema transformation, projected clustering, sequential OLAP, and semantic query and searching. He has published over 100 technical articles, many appeared on leading venues such as SIGMOD, VLDB, ICDE and KDD conferences and ACM TODS, ACM TKDD and IEEE TKDE journals. He was the recipient of the HKU Outstanding Researcher Award. He was the program chairman of the 2001 and 2005 Pacific-Asia Knowledge Discovery and Data Mining Conferences, the conference chairman of the 2007 PAKDD Conference. He was the Program Vice Chair of ICDM 2006 and Program Chair of HKICC 2003. In applied research, under the directorship of Professor Cheung, CECID has received prestigious grants from the Hong Kong SAR Government in a total amout of HK$40M. He and his team has developed open-source ebXML gateway used by developers from more than 80+ countries. The open-source product has received the prominent awards at the Hong Kong 2004 IT Excellence Awards competition, the 2004 Asia-Pacific ICT Awards competition, and the 2005 Linux Business Awards competition. |
| Tuesday, January 22 SC 3403 4 PM |
Title: Dais Research Showcase
Speaker: Kevin Chang, Jiawei Han, Marianne Winslett, ChengXiang Zhai Abstract: |
| Tuesday, January 29 SC 3403 4 PM |
Title: Towards Accurate and Efficient Classification: A Frequent and Discriminative Pattern-based Approach
Speaker: Hong Cheng Abstract: Classification is a core method widely studied in machine learning, statistics, and data mining. A lot of classification methods have been proposed in literature, most of which assume that the input data is in a feature vector representation. However, in many applications, it is desirable to construct accurate classification models on complex structural data which has no initial feature vector representation, including transaction data, sequences, graphs, semi-structured data, and text data. A primary question is how to construct a discriminative and compact feature set, on the basis of which, classification could be performed directly. A concrete example of complex structural data classification is classifying chemical compounds to various classes (e.g., toxic vs. nontoxic, active vs. inactive), where a key challenge is how to construct discriminative graph features. While simple features such as atoms and links are too simple to preserve the structural information, graph kernel methods make it hard to interpret the classifiers. My goal is to use discriminative frequent patterns to characterize complex structural data and thus enhance the classification power. Motivated by this idea, I developed a framework of discriminative frequent pattern-based classification which could lead to a highly accurate, efficient and interpretable classifier on complex data. |
| Tuesday, February 5 SC 3403 4 PM |
Title: Towards Practical and Secure Decentralized Attribute-Based Authorization Systems
Speaker: Adam J. Lee Abstract: The ubiquity of the Internet has led to increased sharing of computational and data resources amongst large numbers of users. As traditional identity-based solutions to the authorization problem do not scale to such large numbers of users, novel attribute-based access control systems based on techniques such as trust negotiation and other forms of distributed proving have been proposed. To date, research in these areas has been largely of a theoretical nature and has produced many important foundational results. However, if these techniques are to be safely deployed in practice, the systems-level barriers hindering their adoption must be overcome. In this talk, we will show that safely and securely adopting trust negotiation technologies is not simply a matter of implementation and deployment, but requires careful consideration of both formal properties and practical issues. We will then describe our theoretical and systems work on reducing the overheads of the policy compliance checking process. In particular, we will discuss Clouseau, a policy compiler that translates access control policies written in existing policy languages into constraint patterns that can be analyzed using a pattern-matching, rather than theorem proving, approach. This approach vastly improves the runtime efficiency of the compliance checking process over more traditional approaches, thereby making it practical to check compliance with non-trivial policies and investigate the design of more scalable server-side trust negotiation implementations. |
| Tuesday, February 12 SC 3403 4 PM |
Title: Learn from Web Search Logs to Organize Search Results
Speaker: Xuanhui Wang Abstract: Effective organization of search results is critical for improving the utility of any search engine. Clustering search results is an effective way to organize search results, which allows a user to navigate into relevant documents quickly. However, two deficiencies of this approach make it not always work well: (1) the clusters discovered do not necessarily correspond to the interesting aspects of a topic from the user's perspective; and (2) the cluster labels generated are not informative enough to allow a user to identify the right cluster. In this paper, we propose to address these two deficiencies by (1) learning ``interesting aspects'' of a topic from Web search logs and organizing search results accordingly; and (2) generating more meaningful cluster labels using past query words entered by users. We evaluate our proposed method on a commercial search engine log data. Compared with the traditional methods of clustering search results, our method can give better result organization and more meaningful labels. |
| Thursday, February 19 SC 3403 4 PM |
Title: On Dominating Your Neighborhood Profitably
Speaker: Cuiping Li (Associate Professor of Renmin University) Abstract: Recent research on skyline queries has attracted much interest in the database and data mining community. Given a database, an object belongs to the skyline if it cannot be dominated with respect to the given attributes by any other database object. Current methods have only considered so-called min/max attributes like price and quality which a user wants to minimize or maximize. However, objects can also have spatial attributes like x, y coordinates which can be used to represent relevant constraints on the query results. In this talk, I will introduce novel skyline query types taking into account not only min/max attributes but also spatial attributes and the relationships between these different attribute types. Such queries support a micro-economic approach to decision making, considering not only the quality but also the cost of solutions. Bio: Cuiping Li is an Associate Professor of Department of Computer Science in Renmin University of China. Her recent research covers various aspects of data analysis techniques including skyline/dominate relationship analysis, data warehouse and multidimensional analysis, ranking processing techniques, clustering, outlier detection, frequent pattern discovery and classification. She has published over 10 papers which include three top-level conference papers (KDD’04, SIGMOD’06 and VLDB’07). She also was the winner of the Beijing Excellent Person Award 2007, conferred by the Beijing City Government. |
| Tuesday, February 26 SC 3403 4 PM |
Title: Mining Massive Moving Object Datasets: From RFID Data Flow Analysis to Traffic Mining
Speaker: Hector Gonzalez Abstract: Recent years have witnessed an enormous increase in moving object data originating from applications such as RFID enabled systems in supply chain operations and traffic monitoring applications in large road networks. Effective exploration and analysis of such data will have an important impact on areas such as business process optimization, city planning, and national security. In order to realize these benefits, we need to develop techniques to cope with the enormous size, complex spatiotemporal structure, and high noise levels of this data. The goal of my research is to develop efficient methods to manage, warehouse, and mine moving object data. In my talk, I will present two themes in this direction. The first, in the context of supply chain operations, is the development of a method to compress and warehouse very large RFID data sets, based on the following observations: (1) lossless data compression can be achieved by removing sensor generated redundancy, and exploiting group object movements, observed in many applications, (2) spatiotemporal characteristics of the data, can be preserved for OLAP analysis at multiple abstraction levels, by the usage of a hierarchical object identification schema. The second theme, in the context of traffic mining, deals with the problem of route recommendations on large road networks. The proposed method differs from previous approaches, in that it takes into consideration of driving patterns, mined from the data, to find routes that are not only fast but also favored by actual drivers. |
| Tuesday, March 4 SC 3403 4 PM |
Title: Canceled
Speaker: Abstract: |
| Wednesday, March 12 SC 3403 4 PM |
(Please note the date change) Title: Statistical Network Analysis and Inference: Methods and Applications Speaker: Eric Xing (Assistant Professor of Carnegie Mellon University) Abstract: Exploring the statistical properties and hidden characteristics of network entities, and the stochastic processes behind temporal evolution of network topologies, are essential for computational knowledge discovery and prediction based on network data from biology, social sciences and various other fields. In this talk, I first discuss a hierarchical Bayesian framework that combines the mixed membership model and the stochastic blockmodel for inferring latent multi-facet roles of nodes in networks, and for estimating stochastic relationships (i.e., cooperativeness or antagonisms) between roles. Then I discuss a new formalism for modeling network evolution over time based on temporal exponential random graphs, and a MCMC algorithm for posterior inference of the latent time-specific networks. The proposed methodology makes it possible to reverse-engineer the latent sequence of temporally rewiring networks given longitudinal measurements of node attributes, such as intensities of gene expressions or social metrics of actors, even when a single snapshot of such measurement resulted from each (time-specific) network is available. Bio: Eric Xing is an assistant professor in the Machine Learning Department, the Language Technology Institute, and the Computer Science Department within the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for building quantitative models and predictive understandings of the evolutionary mechanism, regulatory circuitry, and developmental processes of biological systems; and for building computational intelligence systems involving automated learning, reasoning, and decision-making in open, evolving possible worlds. Professor Xing received his B.S. in Physics from Tsinghua University, his first Ph.D. in Molecular Biology and Biochemistry from Rutgers University, and then his second Ph.D. in Computer Science from UC Berkeley. He has been a member of the faculty at CMU since 2004, and his current work involves, 1) graphical models, Bayesian methodologies, inference algorithms, and optimization techniques for analyzing and mining high-dimensional, longitudinal, and relational data; 2) computational and comparative genomic analysis of biological sequences, systems biology investigation of gene regulation, and statistical analysis of genetic variation, demography and disease linkage; and 3) application of statistical learning in social networks, text/image mining, vision, and machine translation. |
| Tuesday, March 18 SC 3403 4 PM |
Spring Break. No seminar |
| Tuesday, March 25 SC 3403 4 PM |
Canceled. |
| Tuesday, April 1 SC 3403 4 PM |
Title: Communication and Social Interaction
Speaker: Tony Bergstrom and Eric Gilbert Abstract: Social systems for mediated communication encompasses a wide range of interactions. Email, forums, virtual words, SMS, blogs, and many other frameworks have been built to support social communities and communication. This talk offers a quick glance at a number of Social Spaces projects ranging from how people in rural and urban settings differ in their use of technology, how visualization of activity can be used to reveal patterns in open source communities, and how augmenting collocated interaction with new visual cues can alter perception of conversation. How can computing power be utilized to provide and facilitate and analyze meaningful interactions. |
| Tuesday, April 8 SC 3403 4 PM |
Title: Opinion Integration Through Semi-supervised Topic Modeling
Speaker: Yue Lu Abstract: Web 2.0 technology has enabled more and more people to freely express their opinions on the Web, making the Web an extremely valuable source for mining user opinions about all kinds of topics. In this paper we study how to automatically integrate opinions expressed in a well-written expert review with lots of opinions scattering in various sources such as blogspaces and forums. We formally define this new integration problem and propose to use semi-supervised topic models to solve the problem in a principled way. Experiments on integrating opinions about two quite different topics (a product and a political figure) show that the proposed method is effective for both topics and can generate useful aligned integrated opinion summaries. The proposed method is quite general. It can be used to integrate a well written review with opinions in an arbitrary text collection about any topic to potentially support many interesting applications in multiple domains. Title: Modeling with Network Regularization Speaker: Qiaozhu Mei Abstract: In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to this problem, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data. The proposed method combines topic modeling and social network analysis, and leverages the power of both statistical topic models and discrete regularization. The output of this model can summarize well topics in text, map a topic onto the network, and discover topical communities. With appropriate instantiations of the topic model and the graph-based regularizer, our model can be applied to a wide range of text mining problems such as author-topic analysis, community discovery, and spatial text mining. Empirical experiments on two data sets with different genres show that our approach is effective and outperforms both text-oriented methods and network-oriented methods alone. The proposed model is general; it can be applied to any text collections with a mixture of topics and an associated network structure. |
| Thursday, April 15 SC 3403 4 PM |
Title: Multidimensional Analysis of Moving Object Data
Speaker: Xiaolei Li Abstract: The collection of historical or real-time data on moving objects is quickly becoming a ubiquitous task. With the help of GPS devices, RFID sensors, RADAR, satellites, and other technologies, mobile objects of all sizes, whether it be a tiny cellphone or a giant ocean liner, can be easily tracked around the globe. Many fundamental problems in the database field have found their parallels in the moving object domain. They include indexing and query processing of moving objects over static or continuous queries and similarity search between moving objects. The same has happened with data mining problems as well. Clustering of moving objects is one popular topic; spatial association patterns is another. However, even with the recent attention, there are still many unexplored areas in moving objects research. Specifically, higher semantic level problems remain mostly untouched. One example is anomaly detection. With the ever-increasing focus on video surveillance, many cities are tracking and analyzing vehicles as they move throughout the city. With the ultimate goal of automated reporting and alerting, sophisticated algorithms are needed to evaluate the moving object trajectories. Furthermore, associations with other multi-dimensional features will need to be considered as well. Another example is periodic traffic pattern detection. Everyone is familiar with rush hour traffic in big cities, but extracting and representing them in an efficient and concise manner has not been addressed. To this end, we present our studies in this thesis. With regards to anomaly detection, we present three models to automatically detect moving object anomaly, traffic anomaly, and subspace anomaly. The last of which detects anomalies in a multidimensional space, which is often the case in real world datasets. Additionally, we also address problems that could occur due to sampling in a multidimensional space and how to summarize moving object trajectories for more efficient processing. |
| Tuesday, April 22 SC 3403 4 PM |
Title: TBA
Speaker: Alexandre Klementiev Abstract: TBA |
| Tuesday, April 29 SC 3403 4 PM |
Title: TBA
Speaker: Tianyi Wu Abstract: TBA |
|
|
|
DAIS - Database and Information Systems Laboratory, Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave., Urbana, IL 61801, USA. Fax: 217-265-6494, Phone: 217-244-6241. |