Tietokoneavusteinen esseen tarkastaminen

Download Report

Transcript Tietokoneavusteinen esseen tarkastaminen

Computer assisted assessment of
essays
 Advantages
 Reduces costs of assessment
Less staff is needed for assessment tasks
 Increases objectivity
More than one assessor can be used without
doubling the costs
Automated marking is not prone to human error
 Instant feedback
Helps students
 As accurate as human graders
 Measured by correlation between grades given by
humans and system
 Training material




Basis of scores given by computer
Human graded essays
Training is done separately for each assignment
Usually 100 to 300 essays are needed
 Surface features, structure, content
Computer assisted assessment of
essays
 Surface Features





Total number of words per essay
Number of commas
Average length of words
Number of paragraphs
The earliest systems where based solely on surface
features
 Rhetorical Structure
 Identifying the arguments presented in essay
 Measuring coherence
 Content
 Relevance to the assignment
 Use of words
Analysis of Essay Content
 Information retrieval methods
 Vector Space Model
 Latent Semantic Analysis
 Naive-Bayes text categorization
 Ways to improve efficiency
 Stemming, term weighting, use of stop-word list
 Stemming
Reduces the amount of index words
Reducing different word forms to common roots
Finding words that are morphological variants of
the same word stem
• apply -> applying, applies, applied
Analysis of Essay Content
 Term weighting
Raw word frequencies are transformed so that they
tell more about the words’ importance in the context
Amplifies the influence of words, which occur often in
a document, but relative rarely in the whole collection
of documents
Information retrieval effectiveness can be improved
significantly
Term-frequency – inverse document frequency (TfIdf), Entropy
Mij 
log freqij  1
Local term weight



  freqij 
 freqij  
 * log
 



freq
ij
freqij  
1 j  1 j





 1 j


Global term weight
(entropy)
 Stop-word list
Removing the most common words
• For example prepositions, conjunctions, nouns and
articles (a, an, the, and , or...)
Common words have no additional meaning to the
content of the text
Saves processing time and working memory
Comparison of Essay evaluation
systems
 Assessment systems




Project Essay Grade (PEG)
Text Categorization Technique (TCT)
Latent Semantic Analysis (LSA)
Electronic Essay Rater (E-Rater)
Grading simulation
Master analysis
Content
Style
LSA, TCT
E-RATER
PEG, TCT
E-RATER
 Content refers to what the essay says and style refers to the
way it is said
 System can simulate the score without great concern about
the way it was produced (grading simulation) or measure
the intrinsic variables of the essay (master analysis)
Project Essay Grade (PEG)
 One of the earliest implementations of automated
essay grading
 Development began in 1960’s
 Primarily relies on surface features and no natural
language processing is used
 Average word length
 Number of commas
 Standard deviation of word length
 Regression model based on training material
 Scoring by using regression equation
Text Categorization Technique (TCT)
 Measures both content and style
 Uses a combination of key words and text
complexity features
 Naive-Bayes categorization
 Assesment of content
 Analysis of the occurrence of certain key words in the
documents
 Probabilities estimating the likelihood that essay belong
to a specified grade category
 Text Complexity Features
 Assesment of style
 Surface features
Number of words
Average length of words
E-Rater
 A hybrid approach of combining linguistic features
with other document structure features
 Syntax, discourse structure and content
 Syntactic features
Measures the syntactic variety
Ratios of different clause types
Use of modal verbs
 Discourse structure
Measures how well writer has been able to organize the
ideas
Identifies the arguments in the essay by searching “cue”
words or terms that signal where an argument begins
and how it is been developed
 Content
Analyzes how relevant the essay is to the topic by
considering the use of words
Vector Space Model
Latent Semantic Analysis (LSA)
aka
Latent Semantic Indexing (LSI)
 Several Applications
 Information Retrieval
 Information Filtering
 Essay Assessment
 Issues in Information Retrieval
 Synonyms are separate words that have the same
meaning. They tend to reduce recall.
 For example: Football, soccer
 Polysemy refers to words that have multiple meanings.
This problem tends to reduce precision.
 For example: "foot" as the lower part of the leg or as the
bottom of a page or as a specific metrical measure
 Both issues point to a more general problem
There is a disconnect between topics and keywords
 LSA attempts to discover information about the
meaning behind words
 LSA is proposed as an automated solution to the
problems of synonymy and polysemy
Latent Semantic Analysis (LSA)
 Documents are presented as a matrix in which
each row stands for a unique word and each
column stands for a text passage (word-bydocument matrix)
 Truncated singular value decomposition is used to
model latent semantic structure
 Resulting semantic space is used for retrieval
 Can retrieve documents that share no words with
query .
 Singular Value Decomposition
 Reduces the dimensionality of word-by-document matrix
 Using a reduced dimension new relationships between
words and contexts are induced when reconstructing a close
approximation to the original matrix
 These new relationships are made manifest, whereas prior
to the SVD, they were hidden or latent
 Reduces irrelevant data and “noise”
Latent Semantic Analysis (LSA)
 Word-by-document matrix
Latent Semantic Analysis (LSA)
 Singular value decomposition
Latent Semantic Analysis (LSA)
 Two dimensional reconstruction of word-bydocument matrix
Latent Semantic Analysis (LSA)
Latent Semantic Analysis (LSA)
 Semantic space is constructed from the training
material
 To grade an essay, a matrix for the essay document is
built
 Document vector of essay is compared to the semantic
space
Word-by-document matrix
T1
w11
w12
w13
…
w1n
T2
w21
w22
w23
…
w2n
T3
w31
w32
w33
…
w3n
Query
vector
…
wmn
t1
qw1
t2
qw2
t3
qw3
wm1
wm2
wm3
Similarity scores
doc1 doc2 doc3
Compute similarity between document
vectors and query vector
S1
S2
S3
…
docn
…
Sn
…
…
tm
tm
…
doc n
…
…
…
doc3
…
doc2
…
doc1
qwm
 Grade is determined by averaging the grades with the
most similar essays
Latent Semantic Analysis (LSA)
 Document comparison
 Euclidean distance
 Dot product
 Cosine measure
 Cosine between document vectors
cos 
XY
XY
Dot product of vector divided by their lengths
A
B
Latent Semantic Analysis (LSA)
 Pros
 Doesn’t just match on terms, tries to match on concepts
 Cons
 Computationally expensive, its not cheap to compute
singular values
 Choice of dimensionality is somewhat arbitrary, done by
experimentation
 Precision comparison of LSA and Vector Space Model at 10
recall levels