Division of Social and Organizational Psychology |
Version
française
See also the good references
guide
The PROTAN software of
computer-aided content
analysis
Presentation
PROTAN (for PROTocol ANalyzer) is a
computer-aided content analysis system. Being aided by the computer means, in
the present case, that PROTAN does the many tedious tasks of textual analysis
that a human being can do but generally avoids doing, like counting words. Not
infrequently, without further notice, PROTAN will do its job "by default", that
is, by assuming that parameters have the values given initially to the system.
For instance, some system's tasks require as little information as a semicolon,
picking in its memory for the rest of the information required. Never, however,
PROTAN does automatic content analysis.
What kind of text can one analyze with the
help of PROTAN?
PROTAN can handle any textual material such as
narratives, clinical interviews, scientific publications, titles or abstracts of
scientific journals through their publication years, poetry, advertising blurbs,
and many other forms of textual material. The limitations of PROTAN are those
imposed either by statistical constraints, by the unavailability of dictionaries
necessary for analyzing a particular sort of text, or by the lack of hypotheses
on the analyst's side.
The text itself must be presented to PROTAN in
ASCII format and may not spill over column 70. Columns 73 to 80 are filled with
indications of interview, unit, and speaker that mean what the analyst has
decided them to mean.
What does PROTAN do to help one analyze a
text?
The aims of the PROTAN software
PROTAN is tuned to two very different tasks,
corresponding to two different content-analytic strategies (Weber, 1983). In the
first one, PROTAN addresses the question of how does the text look like. Is it
generally abstract, does it become ever more abstract, or less? What is the
profile of the main affective connotations (Anderson & McMaster, 1986) of
the text? For example, one could show that the general mood in Hamlet -using
Whissell's dictionary of affect (Hogenraad, McKenzie, &
Martindale, 1997; Whissell et al., 1986)- progresses as an
inverted-U, with the second branch of the inverted-U going much
lower than the first one. Such a finding does not cut ice: We always suspected
it to be so. This is the very reason to pick this classic text. To achieve this
first task, PROTAN rests on a series of semantic dictionaries that are part of
the system.
The second task to which PROTAN is tuned is to
answer the question of what the text is talking about. What are the main themes
in it? A theme, like any interest, is never fixed. We usually want to know how
the interests in a text come and go. The trick of PROTAN, as of Iker's WORDS
(Iker & Klein, 1974) system from which we got the idea, is to postulate that
there is enough information in the relations between words to allow for themes
to emerge by simply analysing these relations.
The tools of the PROTAN software
To accomplish its tasks, PROTAN avails itself
of three tools. These are the segmentation, the lemmatization, and the
dictionaries.
Segmentation stands for what it means to each of us. One has to divide the text into as many parts as one feels appropriate. If possible, these segments should be meaningful, i.e. letters, chapters of a book, or acts of a playwright. One can also divide the text into artificial units, i.e. segments of 700 words each, or one may have reasons to decide that one needs to divide the text into 20 equal parts.
One program takes care of the job of
segmenting. Its name is CSCUT. This program can be complex. This step must be
taken great care of. Indeed, all further analyses depend on it.
Lemmatization is a barbarism to designate the
operation by which the various endings of words (plurals, conjugations, etc.)
are transformed into a simpler form, for example, the infinitive for
verbs.
Dictionaries are systems of categories (great
dimensions of the mind) that an analyst may be interested in. PROTAN is equipped
with several such dictionaries in different languages. PROTAN is indeed
moderately polyglot.
Standard Operating Procedures
PROTAN is composed of 30 programs. These
programs are modular. This means that each of them has a specific role in a
chain. For instance, program CRWSTRIP, that lemmatizes words, takes its input
from program CSCUT (the one that takes care of segmenting texts) and produces an
output (a system file) to be processed by other ones.
All programs produce at least one output, i.e.
a listing of results. Occasionally, programs produce several outputs: a list of
results and either a system file ready to be used by the next program or
a numeric file to be processed by some statistical package, or both. In our
analysis of Hamlet, the output from the comparison between text and dictionary
is sent out to the SAS statistical package for polynomial analysis. We did not
equipped PROTAN with statistical software.
A list of programs
Following is a list of programs that are
currently part of PROTAN. These are the things that the system can do. Not all
these programs are necessary to have a successful run. Many of these programs
are for creating or editing dictionaries, or striplists, or for editing the
text. For convenience, the list is alphabetical.
Platforms
A distinctive feature of PROTAN is its
portability to several platforms, DOS, UNIX, and Macintosh.
There is no installing procedure; the user can install immediately the 30
programs and organize the inputs (texts, strip dictionaries, parameter files)
and outputs (listings and punch files) as preferred. Punch files are formatted
to be easily exported towards most statistical packages.
Technical specifications
There are no minimal computer requirements, but with corpora over 100,000
words, PROTAN will run faster on powerful platforms such as a UNIX one. PROTAN
is written in C. Each program has been tested in several studies that
used PROTAN as a support. PROTAN has never been submitted for reviews in
computer software magazines or scientific journals.
Further information
Further information or request for assistance concerning the software PROTAN
may be obtained from Robert Hogenraad:
Office:
Dr. Robert Hogenraad
Psychology Department, Catholic
University of Louvain
10 place du Cardinal Mercier
B-1348
Louvain-la-Neuve, Belgium
Ph.: ..32-(0)10-47 4411
Fax: ..32-(0)10-47
3774
E-mail: hogenraad@upso.ucl.ac.be
Private:
63 Avenue Constant Montald, B-1200 Brussels, Belgium
Ph. & Fax: ..32-(0)2-763 2012
Documentation and references
User's manual:
Hogenraad, R., Daubies, C., & Bestgen, Y. (1995). Une théorie et une
méthode générale d'analyse textuelle assistée par ordinateur. Le système PROTAN
(PROTocol ANalyzer) (Version March 2, 1995). Louvain-la-Neuve, Belgium:
Psychology Department, Catholic University of Louvain. (In French).
Bibliographic references :
Anderson, C. W., & McMaster, G. E. (1986). Modeling emotional tone in stories using tension levels and categorical states. Computers and the Humanities, 20(1), 3-9.
Bestgen, Y. (1994). Can emotional valence in stories be determined from words ? Cognition and Emotion, 8(1), 21-36.
Hogenraad, R. (1991). Retratos de Fernando Pessoa. Revista de Comunicação e Linguagens, 14, 91-110.
Hogenraad, R. (1994). Über den Versuch, das Leben der Wörter zu messen. Inhaltsanalytische Verfahren und Literatur. Achim Barsch, Gebhard Rusch, & R. Viehoff (Eds.), Empirische Literaturwissenschaft in der Diskussion (pp. 306323). Frankfurt am Main: Suhrkamp.
Hogenraad, R., & Bestgen, Y. (1989). On the thread of discourse: Homogeneity, trends, and rhythms in texts. Empirical Studies of the Arts, 7(1), 1-22.
Hogenraad, R., Bestgen, Y., & Durieux, J. F. (1992). Psychology as literature. Genetic, Social, and General Psychology Monographs, 118(4), 455478.
Hogenraad, R., Bestgen, Y., & Nysten, J.L. (1995). Terrorist rhetoric: Texture and architecture. In E. Nissan & K. M. Schmidt (Eds.), From information to knowledge. Conceptual and content analysis by computer (pp. 5467). Oxford, England: Intellect.
Hogenraad, R., Boulard, R., & McKenzie, D. (1994). Les mots qui ont fait les relations industrielles. Québec: Presses de l'Université Laval.
Hogenraad, R., Boulard, R., & McKenzie, D. P. (in preparation). An assessment of the creativity of industrial relations journals: An integrative view. Journal of Organizational Behavior.
Hogenraad, R., Kaminski, D., & McKenzie, D. P. (1995). Trails of social science: The visibility of scientific change in criminological journals. Social Science Information, 34(4), 663-685.
Hogenraad, R., McKenzie, D. P., & Martindale, C. (1997). The enemy within: Autocorrelation bias in content analysis of narratives. Computers and the Humanities, 30 (6), 433-439.
Hogenraad, R., McKenzie, D. P., Morval, J., & Ducharme, F. A. (1995). Paper trails of psychology: The words that made applied behavioral sciences. Journal of Social Behavior and Personality, 10(3), 491-516.
Iker, H. P. & Klein, R. H . (1974). WORDS: A computer system for the analysis of content. Behavior Research Methods & Instrumentation, 6(4), 430438.
Weber, R. P. (1983). Measurement models for content analysis. Quality and Quantity, 17, 127-149.
Whissell, C., Fournier, M., Pelland, R., Weir, D., & Makarec, K. (1986).
A dictionary of affect in language. IV. Reliability, validity, and applications.
Perceptual and Motor Skills, 62, 875888.