Department of Computer Science

Unversity of Illinois at Urbana-Champaign

 

Home

People

Research

Seminars

Education

Photos

Links

 

Yahoo!-DAIS Seminars

CS Department Colloquia

Each semester, there are departmental colloquia of interest to the DAIS community. Refer to the department seminar web pages and the Distinguished Lecturer/Entrepreneur Series web page for a complete listing of these seminars, which will usually also be announced on the DAIS mailing list described below.

The Yahoo!-DAIS Seminar (CS591MSW)

The Yahoo!-DAIS Seminar will be held on Tuesdays at 4 PM in 3403 SC. As in other semesters, we will have a few visiting speakers who must be scheduled at a different day or time, due to their travel schedules. Students who take the Yahoo!-DAIS Seminar for credit can miss up to two seminars. Speakers are announced on the DAIS mailing list (as are other items of interest to the DAIS community). It is quick and easy to subscribe to the DAIS mailing list.

Seminar schedules for past semesters: Fall 2009| Summer 2009 | Spring 2009 | Fall 2008 | Spring 2008 | Fall 2007 | Spring 2007 | Fall 2006 | Spring 2006 | Fall 2005 | Spring 2005 | Fall 2004

Fall 2009 Schedule
Coordinator: Lu-An Tang, tang18 AT illinois.edu

Tuesday, 1/19/2010

SC 3403
4-5 PM

Title: Data-oriented Content Query System: Searching for Data into Text on the Web

Speaker: MianWei Zhou
Abstract: As the Web provides rich data embedded in the immense contents inside pages, we witness many ad-hoc efforts for exploiting fine granularity information across Web text, such as Web information extraction, typed-entity search, and question answering. To unify and generalize these efforts, we propose a general search system--Data-oriented Content Query System (DoCQS)--to search directly into document contents for finding relevant values of desired data types. Motivated by the current limitations, we start by distilling the essential capabilities needed by such content querying. The capabilities call for a conceptually relational model, upon which we design a powerful Content Query Language (CQL). For efficient processing, we design novel index structures and query processing algorithms. We evaluate our proposal over two concrete domains of realistic Web corpora, demonstrating that our query language is rather flexible and expressive, and our query processing is efficient with reasonable index overhead.

Video Link

Tuesday, 1/26/2010    SC 3403
4-5 PM

Title: Generating Comparative Summaries of Contradictory Opinions in Text
Speaker: Hyun Duk Kim
Abstract: In this talk, I will present a study of a novel summarization problem called contrastive opinion summarization (COS). Given two sets of positively and negatively opinionated sentences which are often the output of an existing opinion summarizer, COS aims to extract comparable sentences from each set of opinions and generate a comparative summary containing a set of contrastive sentence pairs. We formally formulate the problem as an optimization problem and propose two general methods for generating a comparative summary using the framework, both of which rely on measuring the content similarity and contrastive similarity of two sentences. We study several strategies to compute these two similarities. We also create a test data set for evaluating such a novel summarization problem. Experiment results on this test set show that the proposed methods are effective for generating comparative summaries of contradictory opinions. In addition, we implemented two demo systems which show the usefulness of the algorithm intuitively.

Video Link

Tuesday, 2/2/2010
SC 3403
4-5 PM

 

Title: Efficient Information Extraction over Evolving Text

กก

Speaker: Fei Chen

Abstract:
Information extraction (IE) programs automatically extract structured data such as company names and locations from text corpora. Most current IE approaches have considered only static text corpora, over which we typically apply IE programs only once. However, many real-world text corpora such as Wikipedia are evolving: documents can be added, deleted and modified. Therefore, to keep extracted information up to date, we often must apply IE programs repeatedly, to consecutive corpus snapshots. How can we execute such repeated IE efficiently?

In this talk, I will present solutions for efficient IE over evolving text. The key idea underlying these solutions is to recycle previous IE results, given that consecutive corpus snapshots often contain much overlapping text. I will discuss two systems that successively exploit more recycling opportunities. The first system Cyclex considers an entire IE program as a big blackbox, and recycles its IE results. The second system Delex exploits the compositional nature of many real-world IE programs, and recycles the intermediate IE results as well. Finally, I will present experiment results on two real-life datasets (DBLife and Wikipedia) to demonstrate the utility of these solutions.

Video Link

Tuesday, 2/9/2010
SC 3403
4-5 PM

TITLE: CETR - Content Extraction via Tag Ratios 

Speaker: Time Weninger

ABSTRACT: We present Content Extraction via Tag Ratios (CETR) - a method to extract content text from diverse webpages by using the HTML document's tag ratios. We describe how to compute tag ratios on a line-by-line basis and then cluster the resulting histogram into content and non-content areas. Initially, we find that the tag ratio histogram is not easily clustered because of its one-dimensionality; therefore we extend the original approach in order to model the data in two dimensions. Next, we present a tailored clustering technique which operates on the two-dimensional model, and then evaluate our approach against a large set of alternative methods using standard accuracy, precision and recall metrics on a large and varied Web corpus. Finally, we show that, in most cases, CETR achieves better content extraction performance than existing methods, especially across varying web domains, languages and styles.

 

กก

กก

Tuesday, 2/16/2010
SC 3403
4-5 PM

 

Tuesday, 2/23/2010
SC 3403
4-5 PM

Speaker: Mourad Ouzzani

Tuesday, 3/2/2010
SC 3403
4-5 PM

 

Tuesday, 3/9/2010
SC 3403
4-5 PM

 

Tuesday, 3/16/2010
SC 3403
4-5 PM

 Title:
Speaker: Yuanhua Lv (Tentative)
Abstract:

Tuesday, 3/23/2010
SC 3403
4-5 PM

Spring Break

Tuesday, 3/30/2010
SC 3403
4-5 PM

Title:
Speaker: Tao Cheng (Tentative)
Abstract:

Tuesday, 4/6/2010
SC 3403
4-5 PM

 

Tuesday, 4/13/2010
SC 3403
4-5 PM

 Xin Jin

Tuesday, 4/20/2010
SC 3403
4-5 PM

 Yue Lu

Tuesday,

4/27/2010
SC 3403
4-5 PM

 

Tuesday, 5/4/2010
SC 3403
4-5 PM

Final Reading Day