Number of found records: 11
|
MARCU, Daniel |
|
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts. |
|
PhD thesis, University of Toronto, 1997. |
|
PDF |
|
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of a sequence of units among which some rhetorical relations hold. Two algorithms apply model-theoretic techniques; the other two apply proof-theoretic techniques. The formalization and the algorithms mentioned so far correspond to the theoretical facet of the thesis. An exploratory corpus analysis of cue phrases provides the means for applying the formalization to unrestricted natural language texts. A set of empirically motivated algorithms were designed in order to determine the elementary textual units of a text, to hypothesize rhetorical relations that hold among these units, and eventually, to derive the discourse structure of that text. The process that finds the discourse structure of unrestricted natural language texts is called rhetorical parsing. The thesis explores two possible applications of the text theory that it proposes. The first application concerns a discourse-based summarization system, which is shown to significantly outperform both a baseline algorithm and a commercial system. An empirical psycholinguistic experiment not only provides an objective evaluation of the summarization system, but also confirms the adequacy of using the text theory proposed here in order to determine the most important units in a text. The second application concerns a set of text planning algorithms that can be used by natural language generation systems in order to construct text plans in the cases in which the high-level communicative goal is to map an entire knowledge pool into text. (AU) |
|
rethorical structure; summarization; automation; natural language processing |
Assessment |
|
|
|
|
MARCU, Daniel |
|
The Automatic Construction of Large-Scale Corpora for Summarization Research. |
|
In HEARST, M., GEY. F., TONG, R., (Eds), Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 137-144, University of California, Berkely, August 1999. |
|
On line ( 07/2005) (Only UGR) |
|
Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually annotate for textual importance sufficiently large corpora of texts. To circumvent this problem, we have developed an algorithm that constructs such corpora automatically. Our algorithm takes as input an Abstract, Texti tuple and generates the corresponding Extract, i.e., the set of clauses (sentences) in the Text that were used to write the Abstract. The performance of the algorithm is shown to be close to that of humans by means of an empirical experiment. The experiment also suggests extraction strategies that could improve the performance of automatic summarization systems. (AU) |
|
information system; document; summarization |
Assessment |
|
|
|
|
CONNAWAY, Lynn Silipigni; LOGAN, Rochelle; BROWN, Chistopher |
|
Identifying and Representing Electronic Engineering Resources: A Case Study in Knowledge Management |
|
International Symposium on Research Development and Practice in Digital Libraries, vol 97 |
|
On line ( 15/06/2004) |
|
Current methods of access to the electronic resources offered by the Internet make little use of basic principles of information organization and retrieval, relying instead on relatively informal and, at times, ad hoc approaches. This creates problems in terms of the volume of information retrieved by a user of the Internet and the precision with which that information matches the user's information need. There is a plethora of engineering resources available on the Internet, yet no systematic method of retrieval is available to engineers who are in need of the most current information in their discipline. The Internet is often the only immediate source of the most current engineering resources. The purpose of this project is to identify electronic resources that could be of value to engineers and to represent these resources in a manner that enables engineers to make timely, informed decisions about the usefulness of the resources. This paper addresses the specific objectives the project which include: 1) the development of selection criteria for electronic engineering resources; 2) the identification of electronic resources of interest to engineers, as defined by the selection policy; and 3) the creation of abstracts for these electronic resources that will include at least two hyperlinks to other related electronic resources. (AU) |
|
Internet; engineers; electronic resources; knowledge management; digital library; abstracting; organization of information |
Assessment |
|
|
|
|