Ch. 8 Language Processing: Humans and Computers

Download Report

Transcript Ch. 8 Language Processing: Humans and Computers

Ch. 8 Language Processing:
Humans and Computers
An Introduction to Language (9e, 2009)
by Victoria Fromkin, Robert Rodman
and Nina Hyams
Human Language Processing
• Psycholinguistics focuses on linguistic performance in speech
production and comprehension
• We usually don’t have problems producing or
understanding sentences in our language, and we do both
without effort or awareness
• Some grammatical sentences are difficult to understand
(The horse raced past the barn fell.), and some ungrammatical
sentences are easy to understand (*The baby seems sleeping.)
– This means that language processing is more than grammar alone—
there are psychological mechanisms that work with the grammar to
allow us to produce and comprehend language
The Speech Signal
• Speech sounds can be described by their acoustic (or physical) properties
• The vibrations of our vocal cords cause variations of air pressure, and
sounds we produce can be described in terms of:
– Fundamental frequency (pitch): how fast the variations of air pressure occur
– Intensity: the magnitude of the variations, which determines the loudness of
a sound
– The quality of a speech sound is determined by the shape of the vocal tract;
the shape affects how the sound waves travel
• Spectograms, or voiceprints, can be created by computers and are used to
analyze speech sounds
– Spectograms indicate the intensity, formants (the strongest harmonics
produced by the shape of the vocal tract during production), and pitch of
speech sounds and demonstrate how different speech sounds have
recognizably different acoustic properties
Speech Perception and Comprehension
• The “segmentation problem”: how do listeners carve up the
continuous speech signal into meaningful units?
– Lexical access (or word recognition) is the process of searching your
lexicon for phonological strings that correspond to words
– Stress and intonation provide clues about structure
• The “lack of invariance problem”: how do listeners
recognize different speech sounds when they are used in
different contexts and spoken by different people?
– Listeners can normalize their perceptions to account for rate of
speech and speaker pitch differences
Bottom-up and Top-down Models
• Understanding language in real time is an impressive feat, and there is a
certain amount of guesswork involved in real-time language
comprehension
• Many psycholinguists believe that language perception and
comprehension involves both:
– Top-down processing: proceeding from semantic and syntactic information
to the lexical information from the sensory input
• Listeners can predict that if a speaker says the, then an NP is coming
• In experiments, listeners seem to make much use of top-down information
– Bottom-up processing: moving from the sensory phonetic input to phonemes,
then morphemes, etc. up to semantic interpretation
• Listeners wait to construct an NP until they hear the followed by a noun
Lexical Access and Word Recognition
• In order to discover more about lexical access or
word recognition, psycholinguists have devised
several experiments:
– Lexical decision experiments involve people deciding
whether or not a string of letters or sounds is a word
• Frequently used words such as car are responded to more quickly
than infrequent words such as fig
• This leads researchers to believe that frequent words are more
easily accessed in the lexicon than infrequent words
Lexical Access and Word Recognition
• A lexical decision about the word doctor will be faster if it has been
preceded by the word nurse
– This effect is called semantic priming and could be due to semantically
related words being stored in the same part of the lexicon
• Lexical access experiments show that people retrieve all the meanings of
a word
• Naming tasks require subjects to read printed words aloud and findings
that people read regularly spelled words faster than irregularly spelled
words show that:
– 1. People A) look for the string of letters in their lexicon, and if they find it
they can pronounce the stored representation for it or B) if they don’t
recognize it they can sound it out based on linguistic knowledge
– 2. The mind notices irregularity
Syntactic Processing
• Listeners need to build phrase structure representations of
sentences as they hear them in order to understand the
sentence
– They must place each incoming word in a grammatical category and
disambiguate messages
• Garden path sentences are ones that require listeners to shift
their analysis midway through the sentence
• After the child visited the doctor prescribed a course of injections
– Readers will naturally put the doctor into the slot of direct object for
the verb visited, but as the reader goes on they must change their
analysis and recognize the doctor as the subject of the main clause
instead
Syntactic Processing
• The mind uses two principles in parsing sentences that lead people to go
stray when encountering garden path sentences:
– Minimal attachment: build the simplest structure consistent with the
grammar of the language
– Late closure: attach incoming material to the phrase that is currently being
processed
• Memory constraints prevent the easy comprehension of a sentence like:
• Jack built the house that the malt that the rat that the cat that the dog worried killed ate
lay in.
– Performance constraints like this limit the number of sentences we are likely
to create out of the infinite possibilities
Syntactic Processing
• Shadowing tasks involve subjects repeating what they hear
as rapidly as possible
– Most people can shadow with a delay of 500 to 800 milliseconds, but
some people can shadow within one syllable (300 milliseconds behind)
– Fast shadowers correct speech errors even when told not to, and
corrections are more likely to occur when the target word is
predictable based on linguistic context
• These experiments provide evidence for top-down
processing and show how impressively fast listeners do
grammatical analysis
Speech Production: Planning Units
• Although speech sounds are linearly ordered, slips of the
tongue (including spoonerisms) reveal that speech is
conceptualized before it is uttered
• Intended: ad hoc
• Actual: odd hack
• The vowel sounds [] in the first word and [a] in the second word were
reversed
– This type of error reveals that the second word was already planned
• Interestingly, phonological errors primarily occur in content
morphemes rather than function morphemes, and function
morphemes are not interchanged like content morphemes
Speech Production: Lexical Selection
• Word substitutions are seldom random; we tend to
accidentally replace a word with a semantically
related word
• Sometimes we produce a blend, which is part of one
word and part of another:
• splinters/blisters
• edited/annotated


splisters
editated
– Segments tend to stay in the same position in these blend
errors
Application and Misapplication
of Rules
• Sometimes speakers also make errors with morphological
and syntactic rules
– Rules may be applied to create possible but nonexistent words such
as ambigual
– Regular rules may accidentally be applied to irregular words as in
swimmed
• In an error such as saying a burly bird instead of an early bird,
the appropriate allomorph (a instead of an) is chosen even
though the speaker did not intend to produce a noun
starting with a consonant
– This tells us that the rule to choose a or an must apply after early
was accidentally switched to burly
Nonlinguistic Influences
• Nonlinguistic factors can also contribute to speech
production
• Intended utterance: I’ve never heard of classes on Good Friday
• Actual utterance: I’ve never heard of classes on April 9th
– Good Friday was on April 9th that year, so even though
Good Friday and April 9th have nothing in common
phonologically or morphologically, the nonlinguistic
association was enough to prompt such an error
Computer Processing of
Human Language
• Computational linguistics:
– Is a subfield of linguistics and computer science
– Focuses on the interactions of human language and
computers
– Includes the analysis of:
• written texts and spoken discourse
• the translation of text and speech between languages
• the use of human language for communication between
computers and people
• the modeling and testing of linguistic theories
Computational Phonetics
and Phonology
• Computational phonetics and phonology is
concerned with processing speech
– Speech recognition: analyzing speech and
producing a transcription of it
– Speech synthesis: creating an electronic
simulation of speech to be “said” by the
computer
Speech Recognition
• Many interactive phone systems and cell phones have small vocabularies
that allow for a limited number of messages
– These systems search the speech signal for anything resembling stored
words
• More advanced systems must be trained to a speaker’s particular voice
and uses phonotactics and statistical analysis to recognize speech
– If a sound that could be [l] or [r] occurs after a [d] sound, then computer
knows that [r] is the correct sound, not [l]
– If the computer cannot distinguish between [r] and [l] and therefore must
decide between rate and late in it’s too __, it can either use knowledge of
syntax to choose late, or can use statistical knowledge that It’s too late
occurs much more often than it’s to rate and thus chooses late
• But people are much better at filtering out irrelevant sounds and
focusing on the voice of a single speaker (the cocktail effect)
Speech Synthesis
• Speech sounds can be reduced to a small number of acoustic
components that can be mixed together like a recipe, which
is known as formant synthesis:
– 1. Start with a tone at the same frequency as vibrating vocal cords
– 2. Emphasize the harmonics corresponding to the formants required
for a particular sound
– 3. Add hissing or buzzing for fricatives
– 4. and so on…
• Another approach is known as concatenative synthesis
which relies on recorded units from humans that are
assembled to form the desired utterance
Text-to-Speech
• Text-to-speech programs converts input text into a phonetic
representation (for formant synthesizers) or a representation of
whatever units are to be combined (for concatenative synthesizers)
• Two problems with text-to-speech programs are:
– 1. Homographs that are pronounced differently
• Complex structural knowledge is required to know whether to pronounce read as
[rid] or [rd] in the following sentences:
– I have read the book
– Which girl did the teacher have read the book?
– 2. Inconsistencies in spelling
• Computers now have memory to compile every word that is spelled similarly but
pronounced differently, like tough, bough, cough, and dough
• But new words are always being added to each language, and text-to-speech
programs need rules for converting any new word into speech sounds
Computational Morphology
• Computers also need to understand morphology and be able
to identify morphemes
– One strategy would be to compile all the morphological forms of all a
language’s words into a dictionary
• But, the dictionary would constantly be out of date as new words enter
the language
• And not all forms are predictable, so it would be impossible to predict a
new compound like podcast or the plural of fax
– Stemming is the process of detecting affixes and stripping them from
roots to identify morphemes
• For example, the computer would detect and strip the be- and the –ed
from befriended, and then would identify those morphemes in its
dictionary
Computational Syntax
• Computers must also be able to determine syntactic
structure
– A parser is a program that uses grammar to assign phrase structure
to a string of words
– A top-down parser proceeds by first consulting the grammar rules and then
examining the input string to see if the first word could begin an S
– A bottom-up parser looks at the input string first and then finds phrasal
categories
– When parsing a sentence, a parser may make faulty assumptions
about syntactic categories or structures
• The parser could then backtrack to properly parse the sentence
• Or, the parser may parse all possible structures in parallel, and then
only parses that finish are accepted as valid
Computational Syntax
• We also want computers to be able to use PS rules to create
new sentences
• In order to create complex language, the computer must
assign lexical items to the meanings to be expressed and
then arrange these lexical items in order
– In the top-down approach, the system begins with the highest level
category (S) and then works down to the lexical items
– In the bottom-up approach, the system begins with the lexical items
and then combines them into larger and larger units
Computational Syntax
• A transition network composed of nodes
(circles) and arcs (arrows) may be used to
model syntactic processing
Example: You put up the switch
– First, the computer uses a
model to create the entire sentence
– Then it must create the subject NP
(you)
– Next it must determine the VP
(put up the switch), etc.
Compositional Semantics
• Compositional semantics is concerned with 1)
producing a semantic representation of the input in
the computer and 2) producing natural language to
represent meanings
– To generate sentences, the computer must find words to
represent the concepts to be conveyed
• Then the syntactic rules will apply to these words
– In order to achieve speech understanding, the computer
must find meanings that fit the words and structures of
the input
Compositional Semantics
• The relationships between words can also be demonstrated
with a network like those used for computational syntax
– This model means that you is to do the verb (the agent) and the switch is to
be acted upon by the verb (the theme)
• Or, systems can use formal logic for semantic
representations such as PUT UP(YOU, THE SWITCH)
– The computer can check for truth values in this way by checking if
the pair comprised of YOU and THE SWITCH are included in the
set of pairs that represent the meaning of PUT UP
Compositional Pragmatics
• Computers use semantic and pragmatic knowledge to
analyze structurally ambiguous sentences
• Many natural language systems are equipped with some
contextual and world knowledge
• Computers must also engage in reference resolution, or
determining when two expressions refer to the same object
(for example, pronoun use)
– This requires grammatical knowledge and situational context
Computational Sign Language
• Linguists at Boston University are currently
working on computer algorithms that will
recognize sign language as spoken language
can be
– The signer stands in front of a camera and the
computer recognizes the distinctive features of
sign language such as hand shape, movement,
and orientation
Computer Models of Grammar
• Computers can be programmed to model the
grammar of language
– This forces linguists to be explicit in formulating the
rules grammar
– If the program cannot generate a possible grammatical
sentence, then there is an error in the grammar
– If the program generates an ungrammatical sentence,
then there is an error in the grammar
Frequency Analysis, Concordances,
and Collocations
• Computers can be used to:
– do frequency analyses to reveal the most common words
in written (the, of, and, to, a, in, that, is, was, he) and
spoken (I, and, the, to, that, you, it, of, a, know) American
English
– do concordances, which specify the location of any
particular word and its context
– do collocation analyses, which reveals the occurrences of
two or more words within a short space of each other in a
corpus and provides evidence that the presence of one
word in a text affects the occurrence of other words
Computational Lexicography
• Computational linguists need more information
about words and morphemes than just the
meanings
– The field of computational lexicography is concerned with
making standard dictionaries and dictionaries specifically
for computational linguists
• These special dictionaries contain information about:
–
–
–
–
–
Phonemic transcriptions
Phonetic variants
Syllabification
Syntactic categories
and more
Information Retrieval and
Summarization
• Information retrieval: the use of computers to locate and
display data from possibly very large databases
– Data mining is the term used for complex information retrieval
• Summarization programs allow computers to eliminate
redundancy and identify the most salient features of a body
of information
– These programs can reduce each article in a corpus of articles by a
certain amount, provide just the topic sentence of each paragraph,
or provide paragraphs based on a concept vector
– A concept vector = a list of meaningful key words whose presence in
a paragraph is a measure of the paragraph’s significance
Spell Checkers and
Machine Translation
• Spell checkers range in sophistication from mindless
dictionary lookups to intelligent flagging of incorrect
homonyms (your for you’re, bear for bare, etc.)
• The goal of automatic machine translation is to input a
message from the source language and have it translated
into the target language
– Translation requires more than just replacing each source language
word with a target language word
– Humans encounter morphological, syntactic, idiomatic, and
metaphorical challenges during translation, and these are
exacerbated by an electronic translator
Computational Forensic Linguistics
• Computational linguistics can be used in legal disputes regarding
trademarks:
– A computer search proved that the bound morpheme Mc- is now used
productively to mean “basic” or “inexpensive”
– But a judge ruled that another company could not use Mc- for their product
because it was too firmly associated with McDonald’s for consumers
• Computational linguistics can also be used for the interpretation of legal
terms:
– A court case hinged on the meaning of the word visa, and by searching a
multimillion-word corpus, a computational linguist concluded that visa
meant “a kind of permit to enter a country” not “a permit to request
permission to enter a country”
– This finding affects laws surrounding international travel
Speaker Identification
• Speaker identification is the use of computers
to assist in the task of ascertaining the
identity of a speaker
• Displays of wave forms (which show the
amplitude changes of speech over time) and
spectograms (which show the frequencies of
speech over time) can help provide evidence
in cases needing speaker identification
Speaker Identification
• Consider the following bomb threat:
Good morning. There are three bombs to go off today at three pharmaceuticals in
North Carolina. Please be aware. Advise your people or go to their funerals.
Goodbye.
• In this case, an African American man born and raised in North
Carolina was arrested for making this threat
• But a computational forensic linguist determined that the suspect was
unlikely to be the caller and that the caller was probably not a native
speaker of English
– The caller inserted a vowel so that the pronunciation of goodbye sounded
more like “good-a-bye” which is not likely to be said by a native speaker
– Unlike the caller, the suspect pronounced goodbye without a /d/ and with a
monophthongized last vowels, as is typical for southern pronunciation