Stylistics and stylometry - State University of Zanzibar

Download Report

Transcript Stylistics and stylometry - State University of Zanzibar

Stylistics and stylometry
What is “style”?
• Term not much loved by linguists
– Too vague
– Has connotations in neighbouring fields (“style” = good style, ie a
value judgment)
• Many books/articles make reference to etymology of the
word (Lat. stilus = ‘pen’), so it follows that style is mainly
about written language
• Various definitions, some very close to things already
seen (especially “register”)
• Two main aspects widely supposed:
– style is choice
– style is described by reference to something else
2/28
Style as choice
• For any intended meaning there are a range of
alternative ways of expressing that meaning
• Different choices express nuances
– of meaning
– of other things (style?) eg buy vs purchase
• Example:
– Visitors are respectfully informed that the coin
required for the meter is 50p; no other coin is
acceptable
– 50p pieces only
– Propositional meaning is the same; difference in
expression conveys something else (register etc)
3/28
Style as choice
• Style is a choice, but often the “choice” is
somewhat predetermined
• ie a choice between appropriate and
inappropriate style
• So maybe “style” is just another word for
register?
4/28
Style and the norm
• Some writers define style as
– “individual characteristics of a text”
– “total sum of deviations from a norm”
• But what is the “norm”?
– Is there some form of the language that is neutral as
regards style/register?
– Note also that the norm shifts: eg Bible AV was
written in the vernacular of its time
• Literary stylistics focuses on the exceptional
5/28
• Even if there is no norm, we can describe
style comparatively
– Stylistics mainly involves comparing and
contrasting texts
– and associating linguistic variance with
contextual explanation
• Some authors see style as being what is
added to the text
6/28
Stylistic analysis
• Gulf between literary vs linguistic stylistics
– Lit crit focuses on effect on the reader,
intended or otherwise, so largely intuitive and
subjective
– Linguistic stylistics looking for
characterisations of style (including literary
style) in terms of linguistic phenomena at the
various levels of linguistic description
7/28
Stylistic analysis
• Inventory of linguistic devices and their effect
– usually in a contrastive way:
– in contrast with other writers in a similar genre
– in contrast with other genres
• Linguistic devices described in terms of the
usual linguistic levels of description: phonology,
morphology, lexis, grammar, etc.
• Effects can be directly expressive, or indirectly,
by association
– example: onomatopoeia vs alliteration as a
phonological device
8/28
Stylistic analysis
Crystal & Davy (1969) Investigating English Style
• Informally identify stylistic features felt to
be significant
• Devise a method of analysis which
facilitates comparison between usages
• Identify the stylistic function of the
features so identified
9/28
Types of features
• “Invariable” features due to the individual or the time –
usually of little interest
• Discourse features
– medium (= Halliday’s mode), what features distinguish written
language from spoken language
– participation: eg monologue vs dialogue
• Province (= field) lexis and syntax
• Status (= tenor) features relating to relative social
standing of writer/speaker and reader/listener
• Modality (= text type) eg message delivered as a letter,
postcard, text message, email, etc
• Singularity: deliberate occasional idiosyncracies
10/28
Method and function
• Methods and features determine each other
– you can only measure features that you can extract
– simple counting features are easy to extract
– more complex features can be extracted thanks to
NLP techniques of corpus annotation (tagging,
parsing, etc)
• Describing the function of observed differences
– could be based on intuition
– or (see later) partially automated (factor analysis)
11/28
What to count
• Simple things may characterise different styles
– average sentence length
– average word length
– type:token ratio (vocabulary richness)
• number of types = number of different words
• number of tokens = total number of words
– vocabulary growth (homogeneity of text)
• number of new types in 1st, 2nd, …, nth 1000 words
• in rich varied text, number will climb steadily
• Especially when used comparatively
12/28
What to count
• More complex analyses can give a more interesting
picture
– specific syntactic structures
– degree of modification in NPs
– types of verbs (eg verbs of persuasion, speech verbs, action
verbs, descriptive verbs)
– distribution of pronouns (1st/2nd/3rd person)
– etc … (anything you can think of)
• Quite sophisticated mathematical techniques can give an
overall picture
– eg factor analysis: identifies from a (big) range of variables which
ones best identify/characterize differences
13/28
Normalization and significance
• Always important to compare like with like
– It is usual when counting things to “normalize” over
the length of the text
– If one text is longer than the other, of course you
would expect higher frequencies of everything
• Issue of statistical significance
– Small differences may not really tell you anything
– Various measures can confirm whether difference is
statistically significant or due to random fluctuation
14/28
How to count
• How to recognize paragraph breaks?
• How to recognize sentence breaks?
– Headlines don’t end in a fullstop
– Not all sentences end in a fullstop
– Not all full stops are sentence ending (abbreviations)
• How to count words
– Hyphenated words, contractions e.g. don’t
• How to measure word-length/complexity
–
–
–
–
length only roughly corresponds to complexity
number of characters vs number of syllables
cf. through vs idea
counting syllables implies either a dictionary or an algorithm
15/28
More sophisticated counting
• Tagging and parsing allows you to look at
grammatical and lexical issues
– Use of particular POSs (conjunctions,
pronouns, auxiliaries, modals)
– Use of particular features (tenses, …)
– Use of particular constructions (passives,
interrogatives)
16/28
Quantifying register differences
• Much work based on corpora trying to
quantify and characterize register
differences
• Work pioneered by Douglas Biber
• Simple counts like the ones suggested
• Also, more complex computations
17/28
Example
Exophoric and anaphoric referring expressions
40
35
30
25
Expressions per
20
200 words
15
10
5
0
anaphoric nouns
anaphoric pronouns
exophoric pronouns
conversation
speech
news
academic
Register
From D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating
Language Structure and Use, Cambriufge University Press, 1998.
Ch 5: the study of discourse characteristics
18/28
Multidimensional analysis
• Collect a huge range of measures of a
wide variety
– some simple word counts
– syntactic features
– classes and subclasses of N,V,Adj,Avd
• Factor analysis
19/28
20/28
~150 features in all
21/28
Factor analysis
• Statistical method to take large number of
apparently random variables and group
them together into “factors”
• Factors will be groups of (+ve and –ve)
features
• Linguist might then try to characterize the
factors in terms of some psycholinguistic
feature
22/28
23/28
Example
• Biber took two Google classifications of
text types: “Home” and “Science”
• Harvested ~1500 webpages in each
category (3.74m words)
– originally got ~2500 webpages, but some
were not suitable
http://jan.ucc.nau.edu/biber/Web text types.ppt
24/28
25/28
Summary of analysis
26/28
27/28
28/28