Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao

Download Report

Transcript Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao

Using corpora in
critical discourse analysis
Corpus Linguistics
Richard Xiao
[email protected]
Aims of this session
• Lecture
– Corpora versus critical discourse analysis
– The state of the art of corpus-based discourse studies
– Case study: How is Islam constructed in the UK and US
press before and after 9/11?
• Lab session
– Using Wmatrix to exploring political discourse:
Michael Howard and Tony Blair’s farewell speech
to their party
Critical discourse analysis (CDA)
• Discourse
– Language use above the sentence level
– Language use in context
– Real language use
• CDA examines language as a form of cultural
and social practice, focusing on the
relationship between power and discourse,
and between language and ideology
CL vs. CDA
• Both rely heavily on real language
• ‘a cultural divide’ (Leech 2000: 678-680)
– CDA emphasizes the integrity of text while CL tends to use
representative samples
– CDA is primarily qualitative while corpus linguistics is
essentially quantitative
– CDA focuses on the contents expressed by language while
CL is interested in language (form) per se
– The collector, transcriber and analyst are often the same
person(s) in CDA while this is rarely the case in CL
– The data used in CDA is rarely widely available while
corpora are typically made widely available
A diminishing divide…
• Some important ‘points of contact’ (McEnery
and Wilson 2001: 114)
– The common computer-aided analytic techniques
– The great potential of standard corpora in CDA as
control data
Use of corpora in CDA: pros and cons
• Cons…
– The corpus-based approach tends to obscure ‘the character
of each text as a text’ and ‘the role of the text producer and
the society of which they are a part’ (Hunston 2002: 110)
• CL focuses on text, not text producer
– Analyzing a lot of text from a corpus simultaneously would
force the analyst to lose ‘contact with text’ (Martin 1999:
52)
• Pros…
– Corpora present a real opportunity to discourse analysis,
because the automatic analysis of a large number of texts
at one time ‘can throw into relief the non-obvious in a
single text’ (Partington 2003: 7)
Use of corpora in CDA: pros and cons
• Pros
– ‘Obviously, the methods for doing a ‘critical discourse analysis’
of corpus data are far from established yet. Even when we have
examined a fairly large set of attestations, we cannot be certain
whether our own interpretations of key items and collocations
are genuinely representative of the large populations who
produced the data. But we can be fairly confident of accessing a
range of interpretative issues that is both wider and more
precise than we could access by relying on our own personal
usages and intuitions. Moreover, when we observe our own
ideological position in contest with others, we are less likely to
overlook it or take it for granted.’ (de Beaugrande 1999: 287)
CL and CDA: interaction and synergy
• Partington (2003: 12) proposes a scalar view of the
uses of CL, pointing towards a rationale for using CLrelated methods to carry out CDA
– ‘At the simplest level, corpus technology helps find other
examples of a phenomenon one has already noted. At the
other extreme, it reveals patterns of use previously
unthought of. In between, it can reinforce, refute or revise
a researcher’s intuition and show them why and how
much their suspicions were grounded.’
• Partington (2004, 2006) provides a systematic
description of CADS (corpus-assisted discourse
studies)
CL and CDA: interaction and synergy
• Complementary to each other and interaction benfiting
both areas of research
• CL can provide a general ‘pattern map’ of the data,
mainly in terms of frequencies, key words/clusters and
collocations, as well as their diachronic development (the
latter contributing to the historical perspective in DHA:
Discourse Historical Approach represented and
pioneered by Ruth Wodak), which helps pinpoint specific
periods for text selection or sites of interest
• The CDA analysis can point towards patterns to be
further explored through the CL lens and also provide
explanations for corpus findings
CL and CDA: interaction and synergy
• CL can also examine frequencies (or at least provide
strong indicators of the frequency) of specific
phenomena recognized in CDA (e.g., topoi, topics,
metaphors) by examining lexical patterns
• CL can add a quantitative dimension to CDA to make it
more objective
• CL in general and concordance analysis in particular
can be positively influenced by exposure and
familiarity with CDA analytical techniques
CL and CDA: interaction and synergy
• CL needs to be supplemented by the close analysis of
selected texts using CDA theory and methodology
• CDA, in turn, can benefit from incorporating more
objective, quantitative CL approaches, as
quantification can reveal the degree of generality of,
or confidence in, the study findings and conclusions
in CDA
Possible stages in CADS
Baker et al (2008: 295)
Construction of Islam
in UK and US press around 9/11
• How do news stories construct Islam?
• Have there been any changes before and after 9/11?
• Are there differences between reporting on Islam (as
a religion) and Muslims (as a people)?
• Are there any differences/similarities between
tabloids and broadsheets?
• Are there any differences/similarities between
American and British newspapers?
Why Islam?
• Post WWII – demand for unskilled
labour results in migration of
Pakistani and Bangladeshi Muslims
to the UK
• In April 2001 the former British
Foreign Secretary Robin Cook
reported that Britain’s national dish
is chicken tikka masala
• September 2001 – terrorist attacks
on the US, believed to be associated
with Islamic extremists
• July 2005 – terrorist attacks on UK
Data
• UK and US newspapers in 1998-2005 (pre- and post9/11)
• 87 million words of British news
– Broadsheets (65 M words): The Business, The Guardian, The
Independent & Independent on Sunday, The Observer, The Times
& Sunday Times, Daily Telegraph & Sunday Telegraph
– Tabloids (22 M words): The Daily Express & Sunday Express, The
Daily Mail & Mail on Sunday, Daily Mirror & Sunday Mirror, The
People, Daily Star & Sunday Star, The Sun
• 40 million words of American news
– Financial Times, New York Times, Washington Post, San
Francisco Chronicle
Search terms related to Islam
• Alah OR Allah OR ayatolah OR burka! OR burqa! OR chador!
OR fatwa! OR hejab! OR imam! OR islam! OR Koran OR Mecca
OR Medina OR Mohammedan! OR Moslem! OR Muslim! OR
mosque OR mufti! OR mujaheddin! OR mujahedin! OR
mullah! OR muslim! OR Prophet Mohammed OR Q'uran OR
rupoush OR rupush OR sharia OR shari'a OR shia! OR shi-ite!
OR Shi'ite! OR sunni! OR the Prophet OR wahabi OR yashmak!
AND NOT Islamabad AND NOT shiatsu AND NOT sunnily
Frequencies of articles over time
2011-09
4000
3500
3000
2500
2000
1500
1000
500
0
1998-01
1998-11
1999-09
2000-07
2001-05
2002-03
2003-01
2003-11
2004-09
2005-07
Method
1. Corpora split into 4:
UK pre 9/11 (27 million)
US pre 9/11
UK post 9/11 (60 million)
US post 9/11
2. All sub-corpora compared to a reference corpus (BNC written
– 90 million words)
3. UK sub-corpora compared with US sub-corpora
4. Keywords extracted and analysed via concordances with
respect to moral panic categories
5. UK broadsheets vs. UK tabloids
6. Collocational and concordance analysis of Islam, Islamic,
Muslim, Muslims
Moral panic
• Conceived by Stanley Cohen
(1972) in his study of Mods
and Rockers in the UK
– Violent clash between the gangs
of Mods and Rockers in 1964
– Two conflicting British
subcultures in the mid 1960s
• Referring to the intensity of
feeling expressed by a large
number of people about a
specific group of people who
appear to threaten the social
order at a given time
Features of moral panic
• Build-up of concern over a social issue
• A scapegoat (social group)
• Solutions proposed: moral entrepreneurs
– A person who seeks to influence a social group to adopt or
maintain a norm, e.g. MADD (mothers against drunk
driving), and the anti-tobacco lobby
• Moral panic is often expressed as outrage rather
than fear
• Emotive language is used
• Threat is normally exaggerated
McEnery’s (2005) moral panic categories
• 1. object of offence
– that which is identified as problematic
• 2. consequence
– the negative results which it is claimed will follow
if the object of offence is not eliminated
• 3. corrective action
– the actions to be taken to eliminate the object of
offence
McEnery’s (2005) moral panic categories
• 4. desired outcome
– the positive results which will follow from the elimination
of the object of offence
• 5. moral entrepreneur
– the person/group campaigning against the object of
offence
• 6. scapegoat
– that which is the cause of, or which propagates the cause
of offence
• 7. rhetoric
– register marked by a strong reliance on evaluative lexis
that is polar and extreme (strong language)
UK keywords pre 9.11
• No evidence of moral panic
• References to Iraq, Israel, Kosovo, Palestine
• Muslims often mentioned ‘in passing’ rather
than as main subject of article
• A wider range of contexts pre 911
– fashion, famous, tourists, music, hotel, cricket, sex,
leisure, dance, ski, museum, divorce, café, wine,
gardens, film, beer, holidays, football, exotic, fun
UK - After 9/11
• British Muslims and what they believe
– ‘The vast, vast majority, of Muslims living in the
UK support policing efforts, fear terrorism and
want to work with us," said [Sir Ian].’ (The
Guardian, October 29, 2004).
• Focus on belief
– moderate, militants, fanatics, fundamentalist,
extremists
• Focus on immigration, political correctness
and scroungerphobia (taxpayers)
UK moral panic post 9/11?
Category
Positive Keywords in that Category
Consequence
anger, angry, bad, bombing, bombings, conflict, crime,
dead, death, destruction, died, evil, fear, fears, injured, kill,
killed, killing, murder, terror, threat, victims, violence,
wounded, wrong
Corrective
action
arrested, fight, fighting, invasion, jail, justice, moderate,
occupation, police, revenge, troops
Desired
outcome
best, better, freedom, good, peace, support
Moral
entrepreneur
America, American, Britain, British
Object of
offence
atrocities, attack, attacks, bomb, bombs, criminal,
extremism, failed, hatred, illegal, jihad, radical, regime,
terrible, terrorism, weapons
Scapegoat
Arab, (suicide) bombers, enemy, extremists, immigrants,
Iran, Iraq, Iraqi, Islam, mosque, Muslim, Muslims,
Pakistan, Palestinian, religious, suicide, terrorists
Rhetoric
question, need, must, why
US – before 9/11
• Keywords are mainly proper nouns relating to
Israel/Palestine, Bosnia, Kosovo, Indonesia.
• Peace is a keyword – focus on contexts where
Muslims are aggressed against
• Muslims (occasionally cast as internal to the
US)
US keywords post 9/11
Consequence
attacks, Sept
Corrective
action
American, Americans, forces, intelligence, marine, marines,
military, officials, (war on) terror, war (on terror)
Desired
outcome
NONE
Moral
entrepreneur
Bush, pentagon, United States, US
Object of
offence
Terrorism
Scapegoat
(al) Qaeda, afghan, Afghanistan, al (Qaeda), bin (laden),
(Saddam) Hussein, Hussein’s, insurgents, Iraq, Iraq’s, Iraqi,
Iraqis, (bin) Laden, Saddam (Hussein), Shiite, Shiites, Sunni,
Taliban, terrorist, terrorists,
Rhetoric
NONE
Tabloids vs. Broadsheets
• Style and spelling
– Tabloids (chatty, interactive style)
Pronouns: I, my, me, myself, we, he, she
Emphatic adjectives: stunning, fantastic, terrible,
wonderful
– Broadsheets (logical, formal, ‘nouny’ style)
Conjunctions/determiners: the, that, which however, thus,
than
Formal terms of address: Mr, Ms
Moslem – key in the tabloids
• 7,282 tabloid uses
• 4,834 in the Daily Mail
• 2,208 Daily Express
800
700
600
500
Moslem(s)
400
Muslim(s)
300
200
100
0
9801
9805
9809
9901
9905
9909
0001
0005
0009
01Jan
01Ma
01- 02Se p Jan
02Ma
02- 03Se p Jan
03Ma
03- 04Se p Jan
04Ma
04- 05Se p Jan
05Ma
‘Bin Laden’ in tabloid newspapers
• powerful (mastermind, terrorist godfather,
millionaire, Al Qaeda leader)
• warrior leader (chief, warlord)
• outcast (dissident, exile, fugitive)
• insane (maniac, twisted)
• evil (gloating menace, evil, terrorist,
murderous)
• fanatical (extremist, fanatic, fanatical)
Tabloid villains
• Direct references to terrorist attacks
– terror, terrorists, Taliban, Osama, Bin, Laden,
bomb, bombs, bomber, bombers, plane, suicide,
killers, attack, crash, hijack, September, twin and
towers
• Emotive/evaluative reaction: emotionally
charged lexis
– atrocity, atrocities, tragedy, carnage, horror,
terrible, evil
Other tabloid categories
• Brainwashing
– lure, rant, rants, spew, rouser, brainwashed
“Children are being brainwashed into becoming Islamic extremists
at 300 "Taliban schools" in Britain, it was reported last night.
Youngsters are being indoctrinated with radical Islamic ideals by
militant groups across the country, said leading British Muslim Dr
Zaki Badawi.” (The Sun, December 28, 2001)
• Also, ’scrougerphobia’ and political correctness
Types of belief in tabloid vs. broadsheet
• In the tabloids, Muslims are fanatics and
extremists
• In the broadsheets, Muslims are radicals,
fundamentalists, separatists but also
moderates and progressives
Broadsheet keywords
• More focus on Islam
– The media: book, novel, television, film, poetry
– Other religions: Hindu, Christian, Buddhist, Judaism
– World events: Iran, Iraq, Iraqi, Arab, Israeli, Israel,
Palestinian, Baghdad, Jerusalem, Lebanon, Syria
– War and conflict: military, conflict, army, resistance,
violence, occupied, ceasefire, genocide, peace, invasion
Muslim(s) vs. Islam(ic)
• Tabloids: more focus on Muslims (the people)
– Muslims as terrorists; evil preachers, Muslims as British and
desiring peace, women as victims (honor killings, arranged
marriage, hijab), men as potential terrorists or victims of
racism
• Broadsheets: more focus on Islam (as a religion)
– Stories on terrorism restricted to the word Islamic
Political discourse: Howard vs. Blair
• Use Wmatrix to tag the following two texts
– Tips: It’s a good practice to create one folder for each
file
• Michael Howard’s farewell speech to his party
(2005)
– Leader of Conservative Party in 2003-2007
• Tony Blair’s farewell speech to his party (2006)
– Leader of Labour Party in 1997-2007
A quick “how to”!
• Enter new workarea name (Blair /
Howard)
• Click the browse button to select
the right file
• Click the “upload now” button …
• A new screen will provide you
with an update report … e.g.
part of speech tagging
semantic tagging
frequency lists
You will then be taken to your work area
[My folders]
What you’ll see in the Simple “VIEW of folder”
Click on Frequency to see the most frequent words: what are they?
You can also do concordance searches of words/phrases
Scroll down to see Tag clouds - “key” concepts
--- and investigate Word clouds (= the most “key” words)
The word cloud of Howard’s farewell speech
(compared with Blair)
We use a similar method to investigate
keywords (as with WordSmith)
… with text B
i.e. we compare
text A
… and not only the
frequent items
… so that we can discover the most
significant items within text A
Exploring keywords (as word clouds) in simple view
- and any keywords with LL15+ will appear
Under 3. Word clouds,
scroll down the pop-up menu
to choose Blair
Then click on Go
Advanced View of Howard Folder
Click on Frequency to see the most frequent words (as before)
--- and investigate key parts of speech (POS)
and key concepts / domains
How might we discover the most ‘frequent’ POS? Jot them down
… and the most ‘frequent’ semantic fields? Make a note of them
We can also see all of the keywords using this VIEW
Frequency of words in Howard and Blair
(using advanced view)
Make a note of the similarities and differences …
Exploring keywords using advanced view
Find the “key words compared to:”
drop-down menu, and click Go
You will be taken to a web-page,
which shows ALL keywords …
Keywords for Howard
(when compared with Blair)
IMPORTANT
– anything above LL 15 = 99.99%
confidence of significance
– anything above LL 6.63 = 99%
confidence of significance
• How many keywords from the Howard
text have LL values of 15+? What are
they?
• How many keywords have LL values of
7+? What are they?
• Do you notice anything interesting
about these keywords?
• Do any of the keywords share the same
semantic fields?
Same procedure for key POS and key domains
Find the “key POS compared to:” drop-down menu, and click Go
Find the “key concepts compared to:” drop-down menu, and click
Go
Exploring key domains
(Howard, in comparison to Blair)
• What do you notice about the
“key” domains?
• Do we capture more words by
undertaking a key domain
analysis than we do by
undertaking a keyword analysis?
And, if so, why do you think this
is the case?
• Undertake a keyword analysis of
Blair (using Howard as the
reference corpus) to determine
the differences between the
two speeches