Tietokoneavusteinen esseen tarkastaminen
Download
Report
Transcript Tietokoneavusteinen esseen tarkastaminen
Computer assisted assessment of
essays
Advantages
Reduces costs of assessment
Less staff is needed for assessment tasks
Increases objectivity
More than one assessor can be used without
doubling the costs
Automated marking is not prone to human error
Instant feedback
Helps students
As accurate as human graders
Measured by correlation between grades given by
humans and system
Training material
Basis of scores given by computer
Human graded essays
Training is done separately for each assignment
Usually 100 to 300 essays are needed
Surface features, structure, content
Computer assisted assessment of
essays
Surface Features
Total number of words per essay
Number of commas
Average length of words
Number of paragraphs
The earliest systems where based solely on surface
features
Rhetorical Structure
Identifying the arguments presented in essay
Measuring coherence
Content
Relevance to the assignment
Use of words
Analysis of Essay Content
Information retrieval methods
Vector Space Model
Latent Semantic Analysis
Naive-Bayes text categorization
Ways to improve efficiency
Stemming, term weighting, use of stop-word list
Stemming
Reduces the amount of index words
Reducing different word forms to common roots
Finding words that are morphological variants of
the same word stem
• apply -> applying, applies, applied
Analysis of Essay Content
Term weighting
Raw word frequencies are transformed so that they
tell more about the words’ importance in the context
Amplifies the influence of words, which occur often in
a document, but relative rarely in the whole collection
of documents
Information retrieval effectiveness can be improved
significantly
Term-frequency – inverse document frequency (TfIdf), Entropy
Mij
log freqij 1
Local term weight
freqij
freqij
* log
freq
ij
freqij
1 j 1 j
1 j
Global term weight
(entropy)
Stop-word list
Removing the most common words
• For example prepositions, conjunctions, nouns and
articles (a, an, the, and , or...)
Common words have no additional meaning to the
content of the text
Saves processing time and working memory
Comparison of Essay evaluation
systems
Assessment systems
Project Essay Grade (PEG)
Text Categorization Technique (TCT)
Latent Semantic Analysis (LSA)
Electronic Essay Rater (E-Rater)
Grading simulation
Master analysis
Content
Style
LSA, TCT
E-RATER
PEG, TCT
E-RATER
Content refers to what the essay says and style refers to the
way it is said
System can simulate the score without great concern about
the way it was produced (grading simulation) or measure
the intrinsic variables of the essay (master analysis)
Project Essay Grade (PEG)
One of the earliest implementations of automated
essay grading
Development began in 1960’s
Primarily relies on surface features and no natural
language processing is used
Average word length
Number of commas
Standard deviation of word length
Regression model based on training material
Scoring by using regression equation
Text Categorization Technique (TCT)
Measures both content and style
Uses a combination of key words and text
complexity features
Naive-Bayes categorization
Assesment of content
Analysis of the occurrence of certain key words in the
documents
Probabilities estimating the likelihood that essay belong
to a specified grade category
Text Complexity Features
Assesment of style
Surface features
Number of words
Average length of words
E-Rater
A hybrid approach of combining linguistic features
with other document structure features
Syntax, discourse structure and content
Syntactic features
Measures the syntactic variety
Ratios of different clause types
Use of modal verbs
Discourse structure
Measures how well writer has been able to organize the
ideas
Identifies the arguments in the essay by searching “cue”
words or terms that signal where an argument begins
and how it is been developed
Content
Analyzes how relevant the essay is to the topic by
considering the use of words
Vector Space Model
Latent Semantic Analysis (LSA)
aka
Latent Semantic Indexing (LSI)
Several Applications
Information Retrieval
Information Filtering
Essay Assessment
Issues in Information Retrieval
Synonyms are separate words that have the same
meaning. They tend to reduce recall.
For example: Football, soccer
Polysemy refers to words that have multiple meanings.
This problem tends to reduce precision.
For example: "foot" as the lower part of the leg or as the
bottom of a page or as a specific metrical measure
Both issues point to a more general problem
There is a disconnect between topics and keywords
LSA attempts to discover information about the
meaning behind words
LSA is proposed as an automated solution to the
problems of synonymy and polysemy
Latent Semantic Analysis (LSA)
Documents are presented as a matrix in which
each row stands for a unique word and each
column stands for a text passage (word-bydocument matrix)
Truncated singular value decomposition is used to
model latent semantic structure
Resulting semantic space is used for retrieval
Can retrieve documents that share no words with
query .
Singular Value Decomposition
Reduces the dimensionality of word-by-document matrix
Using a reduced dimension new relationships between
words and contexts are induced when reconstructing a close
approximation to the original matrix
These new relationships are made manifest, whereas prior
to the SVD, they were hidden or latent
Reduces irrelevant data and “noise”
Latent Semantic Analysis (LSA)
Word-by-document matrix
Latent Semantic Analysis (LSA)
Singular value decomposition
Latent Semantic Analysis (LSA)
Two dimensional reconstruction of word-bydocument matrix
Latent Semantic Analysis (LSA)
Latent Semantic Analysis (LSA)
Semantic space is constructed from the training
material
To grade an essay, a matrix for the essay document is
built
Document vector of essay is compared to the semantic
space
Word-by-document matrix
T1
w11
w12
w13
…
w1n
T2
w21
w22
w23
…
w2n
T3
w31
w32
w33
…
w3n
Query
vector
…
wmn
t1
qw1
t2
qw2
t3
qw3
wm1
wm2
wm3
Similarity scores
doc1 doc2 doc3
Compute similarity between document
vectors and query vector
S1
S2
S3
…
docn
…
Sn
…
…
tm
tm
…
doc n
…
…
…
doc3
…
doc2
…
doc1
qwm
Grade is determined by averaging the grades with the
most similar essays
Latent Semantic Analysis (LSA)
Document comparison
Euclidean distance
Dot product
Cosine measure
Cosine between document vectors
cos
XY
XY
Dot product of vector divided by their lengths
A
B
Latent Semantic Analysis (LSA)
Pros
Doesn’t just match on terms, tries to match on concepts
Cons
Computationally expensive, its not cheap to compute
singular values
Choice of dimensionality is somewhat arbitrary, done by
experimentation
Precision comparison of LSA and Vector Space Model at 10
recall levels