Number of found records: 80
|
Hobson, Stacy President; Dorr, Bonnie J.; Monz, Christof; Schwartz, Richard |
|
Task-based evaluation of text summarization using Relevance Prediction |
|
Information Processing & Management, 2007, Vol. 43 Issue 6, p1482-1499 |
|
On line (04/2008) (Only UGR) |
|
Abstract: This article introduces a new task-based evaluation measure called Relevance Prediction that is a more intuitive measure of an individual's performance on a real-world task than interannotator agreement. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user-not an independent user-decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current gold-standard based measure used in the summarization evaluation community. Our goal is to provide a stable framework within which developers of new automatic measures may make stronger statistical statements about the effectiveness of their measures in predicting summary usefulness. We demonstrate-as a proof-of-concept methodology for automatic metric developers-that a current automatic evaluation measure has a better correlation with Relevance Prediction than with LDC Agreement and that the significance level for detected differences is higher for the former than for the latter. (DB) |
|
Automatic summaries; evaluation; Relevance Prediction |
Assessment |
|
|
|
|
MANI, Inderjeet |
|
Summarization Evaluation: An Overview |
|
Proceedings of the NTCIR Workshop 2 Meeting on Evaluation of Chinese and Japanese Text Retrieval and Text Summarization. Tokyo: National Institute of Informatics. 2001. |
|
PDF |
|
This paper provides an overview of different methods for evaluating automatic summarization systems. The challenges in evaluating summaries are characterized. Both intrinsic and extrinsic approaches are discussed. Methods for assessing informativeness and coherence are described. The advantages and disadvantages of specific methods are assessed, along with criteria for choosing among them. The paper concludes with some suggestions for future directions. (AU) |
|
Summarization Evaluation; Intrinsic; Extrinsic; Informativeness; Coherence. |
Assessment |
|
|
|
|
Over, Paul; Dang, Hoa; Harman, Donna |
|
DUC in context |
|
Information Processing & Management, Nov2007, Vol. 43 Issue 6, p1506-1520 |
|
On line (04/2008) (Only UGR) |
|
Recent years have seen increased interest in text summarization with emphasis on evaluation of prototype systems. Many factors can affect the design of such evaluations, requiring choices among competing alternatives. This paper examines several major themes running through three evaluations: SUMMAC, NTCIR, and DUC, with a concentration on DUC. The themes are extrinsic and intrinsic evaluation, evaluation procedures and methods, generic versus focused summaries, single- and multi-document summaries, length and compression issues, extracts versus abstracts, and issues with genre. (DB) |
|
Automatic summaries; evaluation |
Assessment |
|
|
|
|
SAGGION, Horacio; BONTCHEVA, Kalina; CUNNINGHAM, Hamish |
|
Robust generic and query-based summarization. |
|
In Proceedings of the 11th Meeting of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, April 12-17 2003. |
|
PDF |
|
We present a robust summarisation system developed within the GATE architecture that makes use of robust components for semantic tagging and coreference resolution provided by GATE. Our system combines GATE components with well established statistical techniques developed for the purpose of text summarisation research. The system supports "generic" and query-based summarisation addressing the need for user adaptation. (AU) |
|
GATE; summarization; |
Assessment |
|
|
|
|