![]() |
|
|
| Home | People | Research | Seminars | Education | Photos | Links |
SeminarsCS Department ColloquiaEach semester, there are departmental colloquia of interest to the DAIS community. Refer to the department seminar web pages and the Distinguished Lecturer/Entrepreneur Series web page for a complete listing of these seminars, which will usually also be announced on the DAIS mailing list described below.
The Yahoo!-DAIS Seminar (CS591MSW)The Yahoo!-DAIS Seminar will be held on Tuesdays at 4 PM in 3403 SC. As in other semesters, we will have a few visiting speakers who must be scheduled at a different day or time, due to their travel schedules. Students who take the DAIS Seminar for credit can miss up to two seminars. Speakers are announced on the DAIS mailing list (as are other items of interest to the DAIS community). It is quick and easy to subscribe to the DAIS mailing list. Seminar schedules for past semesters: Fall 2008 | Spring 2008 | Fall 2007 | Spring 2007 | Fall 2006 | Spring 2006 | Fall 2005 | Spring 2005 | Fall 2004 Spring 2009 Schedule
|
| Tuesday, Jan 27 SC 3403 4 PM |
Title: Introduction to our DAIS group
Speaker: Prof. Marianne Winslett Abstract: Did you know that the DAIS group has changed its name? Get caught up on this and other hot news in this introduction to the DAIS group and their research areas. As time permits, I will also give tips on how to give a good technical presentation. Online Video: Link |
| Tuesday, Feb 3 SC 3403 4 PM |
Title: Improving Web Search for Difficult Queries
Speaker: Xuanhui Wang Abstract: Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such ``difficult queries,'' improving Web search for these queries can bring significant benefits to users. However, the problem has so far been under-addressed. In this talk, I will present my work on difficult queries. I propose to study this problem from different perspectives, naturally corresponding to different stages of an interactive search process. Specifically, I propose to improve search quality for difficult queries by: (1) Effective query reformulation, i.e., improving a search engine in the stage of query formulation. A query is difficult because it does not contain the right keywords or lacks discriminative keywords. A better formulation of query by addressing vocabulary mismatch or improving discrimination can lead to better results. (2) User-oriented search result organization, i.e., improving a search engine in result presentation. Ambiguous queries are difficult and often lead to search results with mixed senses. Search result organization can make search results easily accessible for users. (3) Incorporating user negative feedback, i.e., improving a search engine by learning from user interactions. When a query is extremely difficult and all the top results (e.g., top 10) are totally irrelevant, the feedback that a user can provide would be solely negative. I propose to develop effective negative relevance feedback strategies to improve the ranking accuracy of the next few pages when the user clicks on the ``Next'' button. Online Video: Link |
| Tuesday, Feb 10 SC 3403 4 PM |
Title:Towards Contextual Text Mining
Speaker: Qiaozhu Mei Abstract: Text is generally associated with all kinds of contextual information. Contextual information can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when he/she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of patterns of topics over different contexts. For instance, analysis of search logs in the context of users can reveal how we can improve the quality of a commercial search engine by optimizing the search results according to particular users, while analysis of text in the context of a social network can facilitate discovery of more meaningful topical communities. Since contextual information affects significantly the choices of topics and words made by authors, in general, it is very important to incorporate it in analyzing and mining text data. In this talk, I will present a new paradigm of text mining, called contextual text mining, where context is treated as a "first-class citizen." I will introduce general ways of modeling and analyzing various kinds of context in text, including simple context, implicit context, and complex context, in the framework of probabilistic language models. I will show the effectiveness of these general contextual text mining techniques with a few sample applications in web search and information retrieval. Online Video: Link |
| Tuesday, Feb 17 SC 3403 4 PM |
Title: Computational Method of Predicting Functional Elements
through Comparative Genomics
Speaker: Xu Ling Abstract: Comparing the genome sequences of different species now becomes a powerful paradigm of decoding genomic information. Because functional elements are evolutionarily constrained, conserved genome sequences and genome organization often indicate their associated functions and the underlying mechanism of genome evolution. My research focuses on two grand challenges of genomics: (1) to decode cis-regulatory modules (CRMs), noncoding DNA sequences controlling gene expression; and (2) to discover gene groups that are functionally related. In this talk, I will first present a probabilistic framework, STEMMA, for cis-regulatory module analysis. Two of the common approaches to predicting CRMs are (i) scoring spatial clusters of binding sites through the use of transcription factor binding profiles, and (ii) exploiting sequence conservation from multiple related genomes. STEMMA integrates these two into a probabilistic formalism by modeling the binding site distribution through Hidden Markov models and the cross-species comparison through stochastic evolutionary models of binding sites. In our experiments, this method is able to significantly improve prediction of regulatory sequences involved in Drosophila early development. I will also talk about a new approach for detecting functional gene groups through comparative genomics. The spatial clusters of genes conserved across multiple species are often under selective constraints and thus represent functional groups. I developed a combinatorial algorithm to detect the conserved gene clusters that dramatically improved the computational efficiency of existing algorithms, making it possible analyze a large number of genomes. In addition, the new statistical evaluation takes into account the phylogenetic relationship among species, an important aspect that has often been missing in previous studies. Applying this approach to 133 bacterial genomes yields many new gene groups and insights to genome evolution. |
| Tuesday, Feb 24 SC 3403 4 PM |
Title: /* Leveraging Code Comments to Improve Software Reliability */
Speaker: Lin Tan Abstract: Software reliability is critically important. This work focuses on addressing fundamental challenges of software reliability: obtaining accurate program specifications and discovering development tools/ languages limitations. In this talk, I will show that comments provide a great data source for obtaining important information, including specifications and problems of current tools/languages. I will present two new approaches, iComment and cComment, to take advantage of underutilized comments to improve software reliability. iComment automatically extracts specifications from comments to detect comment-code inconsistencies, i.e., software bugs and bad comments. Our evaluation on large real-world software including the Linux kernel, Mozilla, and Apache and 2 types of comments shows that iComment effectively extracted 1832 specifications and detected 60 new bugs and bad comments. cComment studies comment semantics and characteristics to further understand what other comments can be utilized, how we can utilize them, and what important problems/ limitations they reveal. We discovered many interesting findings that can guide the design of new languages and tools for improving reliability, programmer productivity, software evolution, etc. iComment and cComment combine techniques from different areas, including natural language processing (NLP), machine learning, information retrieval, program analysis and statistics. Online Video: Link |
| Thursday, Mar 5 (Notice date change) SC 3403 4 PM |
Title: Information Technology and Intelligent Transportation - A Marriage
Made in Heaven
Speaker: Prof. Ouri Wolfson (UI Chicago) Abstract: I will describe our NSF-sponsored IGERT PhD program in Computational Transportation Science. Computational transportation scientists will develop the next generation of intelligent transportation systems, aimed at addressing inefficiencies that cause excessive environmental pollution, fuel consumption, risk to public safety, and congestion. The trainees investigate information technologies in which millions of sensors, mobile devices such as PDA's, in-vehicle computers, and computers in the static infrastructure are integrated into a collaborative environment. Basic research in information management, communications, software architectures, modeling tools, human factors, traffic prediction, and transportation planning is being conducted to found the new discipline of Computational Transportation Science (CTS). Bio: Ouri Wolfson's main research interests are in database systems, distributed systems, and mobile/pervasive computing. He is currently the Richard and Loan Hill Professor of Computer Science at the University of Illinois at Chicago, and an affiliate professor at the University of Illinois at Urbana Champaign. He is also the founder and former Chief Scientist of Mobitrac, a high-tech startup company that had about forty employees before being acquired. Before joining the University of Illinois he has been on the computer science faculty at the Technion, Columbia University, and he has been a Member of Technical Staff at Bell Laboratories. Ouri Wolfson authored over 150 publications, and holds six patents. He is a Fellow of the Association of Computing Machinery, and serves on the editorial boards of the IEEE Transactions on Mobile Computing, J. Ross Publishing Transportation Letters: The International Journal of Transportation Research, and the Springer's Wireless Networks Journal. He received the best paper award for "Opportunistic Resource Exchange in Inter-vehicle Ad Hoc Networks", at the 2004 Mobile Data Management Conference. Online Video: Link |
| Tuesday, Mar 10 SC 3403 4 PM |
Title:RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis
Speaker: Yizhou Sun Abstract: As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) in one huge cluster without distinction is dull as well. In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed (i.e., heterogeneous) information network. A novel clustering framework called RankClus is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RankClus can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering. Online Video: Link |
| Tuesday, Mar 17 SC 3403 4 PM |
Title: Using Temporal and Transactional Information to Improve Predictive Models for Relational Networks (Canceled)
Speaker: Prof. Jennifer Neville (Prudue University) Abstract: Many relational domains contain temporal information and transactional dynamics that are important to model. As an example, consider scientific publication networks—-paper publication events occur over time and coauthor relationships form and develop over time. The temporal aspects of the data can be used to identify relevant relationships and/or as an indication of relationship strength. For example, people that coauthor frequently are more likely to share research interests than people who coauthor infrequently. Also, a paper is more likely to share the topics of its references that were published in the recent past than those that were published in the distant past. Although many relational datasets contain this type of temporal information, past work in relational learning has focused primarily on modeling static “snapshots” of the data and has largely ignored temporal dynamics. In this work, we focus on modeling temporally-varying relationships in predictive models of both attributes and link strength. By analyzing the temporal dynamics and transactional patterns, we aim to identify and emphasize more influential relationships. First, we will introduce a framework for modeling dynamic relational data with a two-phase process, which summarizes the temporal-relational information with kernel smoothing and then moderates attribute dependencies with the summarized relational information. We evaluate our approach on three real-world datasets and show that it results in significant performance gains compared to two baseline approaches that ignore the temporal aspects of the data. Second, we investigate a supervised learning approach for predicting link strength from transactional information. We compare the utility of attribute-based, topological, and transactional features on public data from the Purdue Facebook network. Our results show that we can accurately predict strong relationships, and that transactional-network features are the most influential features for this task. Bio: Jennifer Neville is an assistant professor at Purdue University. She received her PhD from the University of Massachusetts Amherst in 2006. She received a DARPA IPTO Young Investigator Award in 2003 and was selected as a member of the DARPA Computer Science Study Group in 2007. Recently she was chosen by IEEE as one of "AI's 10 to watch" for 2008. Her research focuses on data mining techniques for relational and network domains. Online Video: Link |
| Tuesday, Mar 24 SC 3403 4 PM |
Spring Break, No Seminar. |
| Tuesday, Mar 31 SC 3403 4 PM |
Title: Text Information Management: Challenges and Opportunities
Speaker: Prof. Chengxiang Zhai Abstract: Recent years have seen an explosive growth of text data in multiple domains, notably on the Web, demanding powerful tools for managing and exploiting text information. While relatively mature technologies have been developed for managing structured data by the database community, there are still many challenges to be solved in managing the unstructured text data even though a lot of research progress has been made by the information retrieval community in the past decades. Due to the difficulty in precisely understanding natural language and users' information needs, text information management poses significant challenges and requires collaborative research by multiple communities especially information retrieval, natural language processing, databases, machine learning, and data mining. In this talk, I will review the state of the art of text information management and discuss the major challenges in developing general frameworks, algorithms, and systems for managing text information effectively and efficiently. I will present several interdisciplinary research directions where multiple communities can be expected to collaborate with each other to generate high impact research results. Online Video: Link |
| Tuesday, Apr 7 SC 3403 4 PM |
Title: Privacy - from accessing databases to location based services
Speaker: Prof. Johann-Christoph Freytag (Humboldt-University) Abstract: Over the last years it has become apparent that privacy issues become more and more important when accessing data sources either on the Web or by database management systems. That is, the user does not only want to hide the query, but also the result of that query from others. In the past the problem of querying a database privately was solved by organizational rather than by technical means. In this talk we describe the problem of querying databases privately more formally and discuss existing solutions from the area of private information retrieval (PIR). The lack of efficiency and scalability motivated us look for alternative approaches using a so called “secure co-processor” (built by IBM). We introduce a set of algorithms that take advantage of the (physical) properties of the co-processor and show which algorithms are necessary to guarantee privacy for database queries. In the last part of my talk I briefly describe our vision how to extend the current privacy approach to location-based services, in particular to moving objects such as vehicles (cars). Bio: Johann-Christoph Freytag is a full professor for databases and information systems (DBIS) at the Computer Science Department of the Humboldt-Universität zu Berlin, Germany. Before joining the department in 1994 he was a research staff member at the IBM Almaden Research Center (1985-1987), a researcher at the European Computer-Industry-Research Centre (ECRC, in Munich, Germany, 1987-1989), and the head of Digital's (DEC) Database Technology Center (also in Munich, 1990-1993). He holds a Ph.D. in Applied Mathematics/Computer Science from Harvard University, MA. Dr. Freytag's research interests include all aspects of query processing and query optimization in object-relational database systems, new developments in the database area (such as semi-structured data, data quality, databases and security, Semantic Web), privacy in database systems, mobile systems and mobility, and applying database technology to applications such as GIS, genomics, and bioinformatics/life science. Dr. Freytag spent two sabbaticals at IBM Research and IBM Development (1997, 2001) and was a regular visitor of Microsoft Research and the SQLServer group, Redmond, as a research scientist (2002, 2005, 2007, 2008). In the last years he received the IBM Faculty Award 4 times for collaborative work in the areas of databases, middleware, and bioinformatics/life science. He was a member of the VLDB Endowment until 2007 organizing VLDB 2003 in Berlin. He heads the German database interest group of the GI (Gesellschaft für Informatik) since 2007. Online Video: Link |
| Tuesday, Apr 14 SC 3403 4 PM |
Title: Comparative genomics, regression and thermodynamic modeling of
cis-regulatory sequences
Speaker: Prof. Saurabh Sinha Abstract: Gene regulation in metazoan development is carried out through ~1000 bp long sequences near genes, called cis-regulatory modules. My group works on computational approaches to discovery and analysis of cis-regulatory modules, one of the grand challenges of genomics today. We aim to improve the specificity of such approaches by (i) scoring spatial clusters of strong as well as weak binding sites, rather than focusing on individual sites separately, and (ii) exploiting evolutionary conservation patterns from multiple related genomes. The former is achieved through probabilistic formalisms called Hidden Markov models, while the cross-species comparison is based on evolutionary models. When comparing two moderately diverged species, our method deals with alignment uncertainties in a principled manner, and explicitly models binding site loss/gain that is commonly observed in orthologous regulatory sequences. On-going work extends this two-species analysis framework to multiple species comparison, in the process making various tradeoffs between computational efficiency and biological realism. Going beyond considering spatial clustering of binding sites, our most recent work studies how the specific combination of binding sites in a cis-regulatory module maps to the function of (the gene expression pattern driven by) the module. This leads us to a logistic regression model of gene expression from sequence, which is shown to explain the expression pattern of over 70% of the modules in our test set. The ability to predict the function of a cis-regulatory module allows us to infer the gene regulatory network, as well as to predict novel modules for experimental validation. Finally, I will present on-going work on a more advanced model of regulatory function, built from fundamental thermodynamic principles. This approach is aimed at distinguishing between possible mechanisms of transcription factor-DNA interaction by analysis of sequence and gene expression data. Preliminary studies are able to quantify the role of cooperative interactions between transcription factors in modulating DNA binding. Online Video: Link |
| Tuesday, Apr 21 SC 3403 4 PM |
Title: Private Queries in Location Based Services: Anonymizers are not Necessary
Speaker: Dr. Gabriel Ghinita (Purdue University) Abstract: Mobile devices equipped with positioning capabilities (e.g., GPS) can ask locationdependent queries to Location Based Services (LBS). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between the users and the LBS. This approach has several drawbacks: (i) All users must trust the third party anonymizer, which is a single point of attack. (ii) A large number of cooperating, trustworthy users is needed. (iii) Privacy is guaranteed only for a single snapshot of user locations; users are not protected against correlation attacks (e.g., history of user movement). We propose a novel framework to support private locationdependent queries, based on the theoretical work on Private Information Retrieval (PIR). Our framework does not require a trusted third party, since privacy is achieved via cryptographic techniques. Compared to existing work, our approach achieves stronger privacy for snapshots of user locations; moreover, it is the first to provide provable privacy guarantees against correlation attacks. We use our framework to implement approximate and exact algorithms for nearestneighbor search. We optimize query execution by employing data mining techniques, which identify redundant computations. Contrary to common belief, the experimental results suggest that PIR approaches incur reasonable overhead and are applicable in practice. Bio: Gabriel Ghinita is currently a Postdoctoral Research Associate with the Dept. of Computer Science, Purdue University. He holds a a PhD degree in Computer Science from the National University of Singapore. Gabriel's research interests focus on access control for collaborative environments, and privacy for spatial and relational data. In the past, he held visiting scientist appointments with the Hong Kong University, and the Chinese University of Hong Kong. Gabriel served as invited reviewer for prestigious conferences and journals, such as VLDB, ICDE, TKDE and ACM GIS. Online Video: Link |
| Tuesday, Apr 28 SC 3403 4 PM |
Title: SIGMOD Record's Distinguished Profiles in Databases Column
Speaker: Prof. Jiawei Han Abstract: Online Video: Link |
| Tuesday, May 5 SC 3403 4 PM |
Title: Fake Picassos, Tampered History, and Digital Forgery: Protecting the
Genealogy of Bits with Secure Provenance
Speaker: Ragib Hasan Abstract: As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intelligence and medical data, and authentication of information as it flows through workplace tasks. While significant research has been conducted in this area, the associated security and privacy issues have not been explored, leaving provenance information vulnerable to illicit alteration as it passes through untrusted environments. In this talk, we show how to provide strong integrity and confidentiality assurances for data provenance information in an untrusted distributed environment. We describe our provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes it extremely easy to deploy. We present empirical results that show that, for typical real-life workloads, the run-time overhead of our approach to recording provenance with confidentiality and integrity guarantees ranges from 1% - 13%. For more details, please refer to http://dais.cs.uiuc.edu/provenance Online Video: Link |