Transcript Slide 1
Graph-Based Methods for Automatic Text Summarization Lin Ziheng1, Kan Min-Yen2 and Lee Wee Sun2 School of Computing, National University of Singapore 3 Science Drive 2, Singapore 117543 A paper has been submitted to and accepted by HLT-NAACL 2007 TextGraphs-2 workshop. We also participated in DUC 2007 and have submitted a paper for our system. 1. Abstract 3. Timestamped Graph Current graph-based approaches to text summarization assume static graphs. A suitable evolutionary text graph model may impart a better understanding of the texts. We propose a timestamped graph (TSG) model that is based on evolving networks and human writing and reading processes. Assumptions 1.Writers write articles from the first sentence to the last; 2.Readers read articles from the first sentence to the last. Approach Add sentences into the graph in chronological order. Suitable in modeling the growth of single documents; for multi-document, treat it as multiple instances of the single documents, which evolve in parallel. 2. Motivation No existing text graph approach that models how texts emerge. (LexRank, TextRank) Natural evolving networks. An example Definition The growth of TSG The example is just one instance of TSG with specific parameter settings. We generalize and formalize the TSG algorithm. Citation network The WWW Human writing and reading processes. A timestamped graph algorithm tsg(M) is a 9-tuple (d, e, u, f,σ, t, i, s,τ) that specifies a resulting algorithm that takes as input the set of texts M and outputs a graph G. e - #edges to add per vertex per time step; u - unweighted or weighted edges; σ- vertex selection function σ(u, G); s - skew degree. The success of graph ranking algorithms, such as PageRank. Skew degree 4. Evaluation 5. Conclusion Dataset: DUC 2005, 2006 and 2007. Evaluation tool: ROUGE. Each dataset contains 50 clusters, each cluster contains a query and 25 documents. Summarization system: (1) Graph construction phase – TSG; (2) Sentence ranking phase – PageRank; (3) Sentence extraction phase – MMR re-ranker. e=2 e=2 12th Proposed a timestamped graph model for text understanding and summarization. Applied TSG on DUC 05, 06 and 07, and achieved comparable results. Best performance achieved with specific parameter settings. N Topicsensitive Weighted edges ROUGE-1 No No 0.39358 0.07690 Yes No 0.39443 No Yes Yes Yes TSG subsumes the graphs used by LexRank and TextRank. N ROUGE-2 Skew degree ROUGE-1 ROUGE-2 0 0.36982 0.07580 0.07838 1 0.37268 0.07682 0.39823 0.08072 2 0.36998 0.07489 0.39845 0.08282 3rd 6. Future Works Results of participation in DUC 2007 Optimal performance: e = 2; topic-sensitive PageRank and weighted edges; s = 1. DUC results show: TSG is better tailored to deal with update summaries. Currently looking further on skewed timestamped graphs. Analyzing in-degree distribution of timestamped graphs. 1. Student 2. Supervisor