What is Leximancer?

Download Report

Transcript What is Leximancer?

From Words to Meaning to Insight
Julia Cretchley & Mike Neal
Outline
 Content Analysis
 What is Leximancer?
 Steps to your first analysis
 In-depth Leximancer
What Is Leximancer?
 Leximancer is a software tool designed for analyzing
natural language text data
 Uses statistics-based algorithms
• Initial analysis in minutes
 Automatically analyzes a text collection
• User can direct search, add, remove, merge terms
 Extracts semantic (meaning) and relational
information (more later)
 Outputs include concept map, network cloud,
quantitative data, concept thesaurus
Leximancer Overview
Text
Let’s Look at Some Text
"We use the Laser 500 printer here at the office. We are pretty
happy with it. Once there was a leak and all the toner spilled
out of the machine, but a technician came out and fixed the
problem for us. We still have to top the toner up often. The
printer goes through ink quickly and the cartridges are
expensive, but we put up with this because it delivers good
results reliably. We are pleased with the quality of rinting we
get. The Laser 500 can batch process, and collate the pages to
save us time. Sometimes paper gets jammed in the Laser 500.
Then we have to open it up to remove the crumpled paper.
We have tried other machines in the past, but have not found
an alternative that works better for us.”
What is this text about? (one main topic)
Concept Extraction
 Terms around a word indicate its meaning
 Word associations discover concepts; language
independent
 Leximancer concept: A group of related words that
travel together in the text
• Evidence words include synonyms and adjectives
 They begin as seed words for coding and evolve to a thesaurus
• word-like, Name-like (proper nouns), and compounds
(United States)
Concept Extraction cont
 A few things to note...
•
•
•
•
Several concepts may be in a single sentence
Concept may span multiple sentences
Adjustable resolution (default: 2 sentences)
Stop lists remove common words (the, and)
 Algorithms
• Threshold of evidence words for a concept must be
present to be coded in a block of text
• Concept can be coded with evidence words, even if the
actual seed word (printer) is not present
Concept Extraction Units of Resolution
"We use the laser 500 printer here at the office. We are pretty
happy with it. Once there was a leak and all the toner spilled
out of the machine, but a technician came out and fixed the
problem for us. We still have to top the toner up often. The
printer goes through ink quickly and the cartridges are
expensive, but we put up with this because it delivers good
results reliably. We are pleased with the quality of rinting we
get. The laser 500 can batch process, and collate the pages to
save us time. Sometimes paper gets jammed in the laser 500.
Then we have to open it up to remove the crumpled pages. We
have tried other machines in the past, but have not found an
alternative that works better for us.”
Leximancer divides into two sentence units (configurable)
Concept Extraction Units of Resolution
"We use the Laser 500 printer here at the office. We are pretty
happy with it. Once there was a leak and all the toner spilled
out of the machine
machine, but a technician came out and fixed the
problem for us. We still have to top the toner up often. The
printer goes through ink quickly and the cartridges are
expensive, but we put up with this because it delivers good
results reliably. We are pleased with the quality of rinting we
get. The Laser 500 can batch process, and collate the pages to
500
save us time. Sometimes paper gets jammed in the Laser 500.
paper
Then we have to open it up to remove the crumpled paper.
We have tried other machines in the past, but have not found
an alternative that works better for us.”
printer concept: laser 500, toner, machine, rinting
paper concept: pages, crumpled, jammed
Semantic and Relational Analysis
Semantic meaning created through conceptual
analysis
• Presence and frequency of words, phrases
• Co-occurrence of words make a concept
• Explicit and implicit concepts identified
(tsunami and earthquake imply Japan)
 Relationships created through concept cooccurrence
Themes and Concept Map
 Themes
• Collection of related concepts in close proximity on the map
• Theme name is most prominent concept
 Concept map display
• Size of dots means frequency of occurrence
• Line between concepts show relationships
• Map proximity is by shared friends links (LinkedIn)
 Concept map becomes interface to explore underlying text
Concept and Theme Creation
Evidence words (thesaurus)
Laser 500
machine
toner
rinting
pages
crumpled
jammed
Concepts
printer
paper
2 co-occurrences of printer and paper
Additional Features
 Thesaurus (coding dictionary) automatically
generated
• No manual coding required
• Profiling and directed coding supported
 Analyst can seed their own terms
 Sentiment lens feature for affective analysis
 Discourse analysis of speakers supported
 Survey data analysis supported
Key Points Summary
 Automated, statistical approach
• How do you do this manually?
• No data management, dictionary creation and updates
 User does not have to formulate a coding scheme
• This saves time, and
• Avoids introduction of researcher bias (grounded theory)
 Nuances, subtleties, distinction in expression
• Word association approach most likely to identify these
 Evidence words with links from Leximancer allows deeper
exploration, documentation of findings
Questions?