Presentation Title

Download Report

Transcript Presentation Title

A linear least squares
framework for learning
ordinal classes
Ioannis Mariolis, PhD
Outline
• Introduction to Ordinal Data Modeling
• Generalized Linear Models
– Ordinary Least Squares (OLS) Regression
– Ordinal Logistic Regression (OLR)
• Linear Classifier of Ordinal Classes
– learns a linear model
• modifies OLS regression
• Experimental Results
– synthetic datasets
– real datasets
• visual features
• textile seam quality control
• Conclusions
Intro
Ordinal Data Modeling
• Collection of measurements called
data
• Building a model to fit the data
• The term ordinal refers to the scale of
measurement of the data
Intro
Scales of Measurement
• Measurement is the assignment of
numbers to objects or events in a
systematic fashion
• Four levels of measurement scales are
commonly distinguished
– Nominal
– Ordinal
– Interval
– Ratio
Intro
Nominal Scale
• Nominal measurement consists of assigning
items to groups or categories
• No quantitative information is conveyed
and no ordering of the items is implied
– qualitative rather than quantitative
• Variables measured on a nominal scale are
often referred to as categorical or
qualitative variables
Intro
Ordinal Scale
• Measurements with ordinal scales are
ordered
– higher numbers represent higher values
• The intervals between the numbers
are not necessarily equal
• There is no "true" zero point for
ordinal scales
– the zero point is chosen arbitrarily
Intro
Interval Scale
• On interval scales, one unit represents the
same magnitude across the whole range of
the scale
• Interval scales do not have a "true" zero
point
• It is not possible to make statements about
how many times higher one score on that
scale is than another
– e.g. the Celsius scale for temperature
• equal differences on this scale represent equal
differences in temperature
• but a temperature of 30 degrees is not twice as
warm as one of 15 degrees
Intro
Ratio Scale
• Ratio scales are like interval scales
except they have true zero points
– e.g. the Kelvin scale of temperature
• this scale has an absolute zero
• a temperature of 300 Kelvin is twice as high
as a temperature of 150 Kelvin
Intro
Ratio Scale
• Ratio scales are like interval scales
except they have true zero points
– e.g. mean
the Kelvin
scale of
temperature
Earth’s
temperature
is about
14o C (287o K),
and •it this
drops
as ahas
function
of the zero
earth-sun
scale
an absolute
distance’s
square root.
doubling
the distance
• a temperature
ofThus,
300 Kelvin
is twice
as high
resultsastoaatemperature
factor of ~1.4ofdecrease
in temperature.
150 Kelvin.
The calculations should be made in Kelvin
(287/1.4=205) resulting to a difference of 82
degrees. The new temperature would be -68o C
and not 14/1.4=10o C
Intro
Classification to Ordinal
Classes
• Pattern classification addresses the issue of
assigning objects to different categories called
classes
• Most often those classes are of nominal scale
– discrete classes
– with no established relationship among them
• In some cases, additional information regarding
the arrangement of the classes is available
– e.g. an order among the classes is exhibited
– in that case the predicted classes are of ordinal scale
– classification is bridged to metric regression in a setting
called ranking learning or ordinal regression
Intro
Classification to Ordinal
Classes
• Pattern classification addresses the issue of
assigning objects to different categories called
classes
• Most often those classes are of nominal scale
– discrete classes
– with no established relationship among
themto variables
applied
• In some cases, additional information
regarding
measured
on interval
the arrangement of the classes is available
or ratio scales
– e.g. an order among the classes is exhibited
– in that case the predicted classes are of ordinal scale
– classification is bridged to metric regression in a setting
called ranking learning or ordinal regression.
State of the Art
Intro
• Ordinal regression problems have been addressed
in both machine learning and statistics domain
•
In Frank (2001) classes’ ordering was
encoded by a set of nested binary
classifiers
–
•
•
•
•
–
•
proportional odds model
proportional hazards model
In Tutz (2003) generalized additive
models were extended into a semiparametric approach
–
incorporating in the design of SVMs
information regarding the order of the •
classes
A probabilistic kernel approach to
ordinal regression was proposed by
Chu (2005)
In McCullagh (1980) multinomial
logistic regression is extended to
apply to ordinal data by using
cumulative probabilities
–
–
employing distribution independent
methods
Modifications of support vector
machines have been proposed in
Shashua (2003), Chu (2005),
Pelckmans (2006)
–
•
the classification results were
organized for prediction accordingly
A constrained classification approach,
based on binary classifiers, was
proposed in Har-Peled (2003)
A loss function between pair of ranks
was used in Herbrich (2000)
–
•
based on the maximization of
penalized log likelihood
choice of used parameters based on
minimization of the Akaike criterion
In Johnson (1999) sampling
techniques were employed in order
to apply Bayesian inference on
parametric models for ordinal data
In Krammer (2001) and Torra (2006)
the ordinal values are transformed
into numeric, and then standard
metric regression analysis is
performed
State of the Art
Intro
• Ordinal regression problems have been addressed
in both machine learning and statistics domain
Extending
Binary
Classifiers
•
In Frank (2001) classes’ ordering was
encoded by a set of nested binary
classifiers
–
•
•
•
•
–
•
proportional odds model
proportional hazards model
In Tutz (2003) generalized additive
models were extended into a semiparametric approach
–
incorporating in the design of SVMs
information regarding the order of the •
classes
A probabilistic kernel approach to
ordinal regression was proposed by
Chu (2005)
In McCullagh (1980) multinomial
logistic regression is extended to
apply to ordinal data by using
cumulative probabilities
–
–
employing distribution independent
methods
Modifications of support vector
machines have been proposed in
Shashua (2003), Chu (2005),
Pelckmans (2006)
–
•
the classification results were
organized for prediction accordingly
A constrained classification approach,
based on binary classifiers, was
proposed in Har-Peled (2003)
A loss function between pair of ranks
was used in Herbrich (2000)
–
•
based on the maximization of
penalized log likelihood
choice of used parameters based on
minimization of the Akaike criterion
In Johnson (1999) sampling
techniques were employed in order
to apply Bayesian inference on
parametric models for ordinal data
In Krammer (2001) and Torra (2006)
the ordinal values are transformed
into numeric, and then standard
metric regression analysis is
performed
State of the Art
Intro
• Ordinal regression problems have been addressed
in both machine learning and statistics domain
Extending
SVM
Classifiers
•
In Frank (2001) classes’ ordering was
encoded by a set of nested binary
classifiers
–
•
•
•
•
–
•
proportional odds model
proportional hazards model
In Tutz (2003) generalized additive
models were extended into a semiparametric approach
–
incorporating in the design of SVMs
information regarding the order of the •
classes
A probabilistic kernel approach to
ordinal regression was proposed by
Chu (2005)
In McCullagh (1980) multinomial
logistic regression is extended to
apply to ordinal data by using
cumulative probabilities
–
–
employing distribution independent
methods
Modifications of support vector
machines have been proposed in
Shashua (2003), Chu (2005),
Pelckmans (2006)
–
•
the classification results were
organized for prediction accordingly
A constrained classification approach,
based on binary classifiers, was
proposed in Har-Peled (2003)
A loss function between pair of ranks
was used in Herbrich (2000)
–
•
based on the maximization of
penalized log likelihood
choice of used parameters based on
minimization of the Akaike criterion
In Johnson (1999) sampling
techniques were employed in order
to apply Bayesian inference on
parametric models for ordinal data
In Krammer (2001) and Torra (2006)
the ordinal values are transformed
into numeric, and then standard
metric regression analysis is
performed
State of the Art
Intro
• Ordinal regression problems have been addressed
in both machine learning and statistics domain
Explicitly
Ordinal
Approach
•
In Frank (2001) classes’ ordering was
encoded by a set of nested binary
classifiers
–
•
•
•
•
–
•
proportional odds model
proportional hazards model
In Tutz (2003) generalized additive
models were extended into a semiparametric approach
–
incorporating in the design of SVMs
information regarding the order of the •
classes
A probabilistic kernel approach to
ordinal regression was proposed by
Chu (2005)
In McCullagh (1980) multinomial
logistic regression is extended to
apply to ordinal data by using
cumulative probabilities
–
–
employing distribution independent
methods
Modifications of support vector
machines have been proposed in
Shashua (2003), Chu (2005),
Pelckmans (2006)
–
•
the classification results were
organized for prediction accordingly
A constrained classification approach,
based on binary classifiers, was
proposed in Har-Peled (2003)
A loss function between pair of ranks
was used in Herbrich (2000)
–
•
based on the maximization of
penalized log likelihood
choice of used parameters based on
minimization of the Akaike criterion
In Johnson (1999) sampling
techniques were employed in order
to apply Bayesian inference on
parametric models for ordinal data
In Krammer (2001) and Torra (2006)
the ordinal values are transformed
into numeric, and then standard
metric regression analysis is
performed
State of the Art
Intro
• Ordinal regression problems have been addressed
in both machine learning and statistics domain
Treat
Ordinal
Data as
Numeric
•
In Frank (2001) classes’ ordering was
encoded by a set of nested binary
classifiers
–
•
•
•
•
–
•
proportional odds model
proportional hazards model
In Tutz (2003) generalized additive
models were extended into a semiparametric approach
–
incorporating in the design of SVMs
information regarding the order of the •
classes
A probabilistic kernel approach to
ordinal regression was proposed by
Chu (2005)
In McCullagh (1980) multinomial
logistic regression is extended to
apply to ordinal data by using
cumulative probabilities
–
–
employing distribution independent
methods
Modifications of support vector
machines have been proposed in
Shashua (2003), Chu (2005),
Pelckmans (2006)
–
•
the classification results were
organized for prediction accordingly
A constrained classification approach,
based on binary classifiers, was
proposed in Har-Peled (2003)
A loss function between pair of ranks
was used in Herbrich (2000)
–
•
based on the maximization of
penalized log likelihood
choice of used parameters based on
minimization of the Akaike criterion
In Johnson (1999) sampling
techniques were employed in order
to apply Bayesian inference on
parametric models for ordinal data
In Krammer (2001) and Torra (2006)
the ordinal values are transformed
into numeric, and then standard
metric regression analysis is
performed
State of the Art
Intro
• Ordinal regression problems have been addressed
in both machine learning and statistics domain
Treat
Ordinal
Data as
Numeric
•
In Frank (2001) classes’ ordering was
encoded by a set of nested binary
classifiers
–
•
•
•
•
•
proportional odds model
proportional hazards model
In Tutz (2003) generalized additive
models were extended into a semiparametric approach
–
incorporating in the design of SVMs
information regarding the order of the •
classes
A probabilistic kernel approach to
ordinal regression was proposed by
Chu (2005)
In McCullagh (1980) multinomial
logistic regression is extended to
apply to ordinal data by using
cumulative probabilities
–
–
employing distribution independent
methods
Modifications of support vector
machines have been proposed in
Shashua (2003), Chu (2005),
Pelckmans (2006)
–
•
the classification results were
organized for prediction accordingly
A constrained classification approach,
based on binary classifiers, was
proposed in Har-Peled (2003)
A loss function between pair of ranks
was used in Herbrich (2000)
–
•
based on the maximization of
penalized log likelihood
choice of used parameters based on
minimization of the Akaike criterion
Ordinary Least Squares
–
will be implied when
In Johnson (1999) sampling
referring
to Metric
techniques
were employed
in order
to apply Bayesian
inference on
Regression
parametric models for ordinal data
In Krammer (2001) and Torra (2006)
the ordinal values are transformed
into numeric, and then standard
metric regression analysis is
performed
GLMs
Generalized Linear Models
•
GLMs are a generalization of the OLS regression
– were formulated as a way of unifying under one framework
• linear regression
• logistic regression
• Poisson regression
– a general algorithm for maximum likelihood estimation in all these
models has been developed
•
According to GLM theory
– a linear predictor is related the distribution function of the dependent
variables through a link function
– each outcome of the dependent variables, Y, is assumed to be
generated from a particular exponential-type probability density
function
• Normal, Binomial, Poisson distributions, etc
• The mean, μ, of the distribution depends on the independent variables, x,
through:
E Y     g 1  xb  ,
where E{Y} is the expected value of Y; g is the link function; b are the
unknown weights of the linear model
– The unknown weights b, called also regression coefficients, are typically
estimated with maximum likelihood or Bayesian techniques
GLMs
Generalized Linear Models
•
GLMs are a generalization of the OLS regression
– were formulated as a way of unifying under one framework
• linear regression
• logistic regression
• Poisson regression
•
In case Y follows the Normal
– a general algorithm for maximum likelihood estimation in all these
models hasdistribution
been developed and g is the identity
Accordingfunction,
to GLM theory
the GLM is the standard linear
– a linear predictor is related the distribution function of the dependent
model
variables through a linkregression
function
– each outcome of the dependent variables, Y, is assumed to be
generated from a particular exponential-type probability density
function
• Normal, Binomial, Poisson distributions, etc
• The mean, μ, of the distribution depends on the independent variables, x,
through:
E Y     g 1  xb  ,
where E{Y} is the expected value of Y; g is the link function; b are the
unknown weights of the linear model
– The unknown weights b, called also regression coefficients, are typically
estimated with maximum likelihood or Bayesian techniques
GLMs
Generalized Linear Models
•
GLMs are a generalization of the OLS regression
– were formulated as a way of unifying under one framework
• linear regression
• logistic regression
• Poisson regression
•
In the context of this presentation x
– a general algorithm for maximum likelihood estimation in all these
corresponds
to feature vectors and Y to
models
has been developed
According to GLM theory
classes
– a linear predictor is related the distribution function of the dependent
variables through a link function
– each outcome of the dependent variables, Y, is assumed to be
generated from a particular exponential-type probability density
function
• Normal, Binomial, Poisson distributions, etc
• The mean, μ, of the distribution depends on the independent variables, x,
through:
E Y     g 1  xb  ,
where E{Y} is the expected value of Y; g is the link function; b are the
unknown weights of the linear model
– The unknown weights b, called also regression coefficients, are typically
estimated with maximum likelihood or Bayesian techniques
GLMs
Ordinary Least Squares
• The simplest and very popular GLM
• The distribution function is the normal distribution
with constant variance and the link function is the
identity
EY  xb
• Unlike most other GLMs, the maximum likelihood
estimates of the linear weights are provided in a
closed form solution
• X is the matrix consisting of all available feature
vectors x
• Y is the vector consisting of the observed values of
the dependent variables Y
• The model’s linear weights b are given by
b =  X X  XT Y
T
-1
GLMs
Ordinary Least Squares (cont.)
• OLS is designed to process interval or ratio
variables
• OLS estimates are likely to be satisfactory from a
statistical perspective when an ordinal level
variable is examined
– if it is measured in a relatively high number of ascending
categories
– if it can be assumed that the interval each category
represents, is the same as the prior interval
• Thus, OLS can be applied to ordinal measurements
treated as if they were interval
– it is most likely that some of the assumptions of the
Gauss-Markov theorem are not met and the regression is
not the Best Linear Unbiased Estimator
GLMs
Ordinal Logistic Regression
•
•
Explicitly takes into account an ordered categorical dependent
variable and does not assume any specific distance among the
categories
Different regression models that can be applied in case of ordinal
measurements are proposed
– the proportional odds model is assumed
•
Like in multinomial logistic regression (MLR), in OLR
– a multinomial distribution is assumed
– the logit is selected as the link function
•
The main difference between MLR and OLR is that rather than
estimating the probability of a single category, OLR estimates a
cumulative probability
– i.e. the probability that the outcome is equal to or less than the
category of interest c
c
P Y  c    P Y  i 
i 1
GLMs
Ordinal Logistic Regression
•
•
Explicitly takes into account an ordered categorical dependent
variable and does not assume any specific distance among the
categories
Different regression models that can be applied in case of ordinal
measurements are proposed
– the proportional odds model is assumed
•
Like in multinomial logistic regression (MLR), in OLR
– a multinomial distribution is assumed
c denotes
the integer
– the logit is selected
as the link function
•
values used
The main difference betweento
MLR
and the
OLR isclasses
that rather than
label
estimating the probability of a single category, OLS estimates a
cumulative probability
– i.e. the probability that the outcome is equal to or less than the
category of interest c
c
P Y  c    P Y  i 
i 1
GLMs
Using the
Logit
equation, the
probabilities
for each
instance
belonging to
each class
can be
estimated
Ordinal Logistic Regression (cont.)
• the proportional odds model employs the
cumulative probability’s logit equation
 P Y  c  
logit  P Y  c    ln 
    xb
 1  P Y  c   c


• The threshold values  c are different for each
category
• The weights of the linear model contained in
vector b are assumed to remain constant for every
category
• A Log-Likelihood function (LL) is created and the
parameter values that maximize that function are
estimated using computational methods
LCOC
In case of
metric
regression a
numerical
mapping is
needed and
the results
do not
correspond
to
probabilities
Linear Classifier of Ordinal
Classes
• Numerical mapping of the K ordered classes
ω
ωK into real numbers z1  z2   zK
. 1 ω2
• Classification is based on the assumption of a linear
relationship between
– the numerical input vectors and
– the numerical values assigned to the ordered classes
• A linear output y is produced as the dot product of
input vector x and vector b containing the weights of
the linear model
• The output o derives as the class ωj assigned with the
numerical value j that is the nearest to the linear
output y. j is given by
j  arg min y  zk , k  1, 2,...K
k
LCOC
In case of
metric
regression a
numerical
mapping is
needed and
the results
do not
correspond
to
probabilities
Linear Classifier of Ordinal
Classes
• Performs numerical mapping of the K ordered classes
ω
ωK into real numbers z1  z2   zK
. 1 ω2
• Classification is based on the assumption of a linear
relationship between
– the numerical input vectors and
– the numerical values assigned to the ordered classes
• A linear output y is produced as the dot product of
input vector x and vector b containing the weights of
the linear model
• The output o derives as the class ωj assigned with the
numerical value j that is the nearest to the linear
output y. j is given by
j  arg min y  zk , k  1, 2,...K
k
LCOC
Training LCOC-the naïve case
• Arbitrary consequent numbers are assigned to the
ordered classes:
if ω  ωk then t  k , k  1, 2,..., K 
i 
i 
• The linear output of the classifier is xb where vector
b has been estimated by minimizing the Sum of
Squared Errors (SSE)
SSE   t  Xb 
T
 t  Xb 
matrix X is the design matrix consisting of all available
input vectors, t denotes the vector of the corresponding
targets
•

Then b  arg min SSE  = X X
b
T

-1
XTt
LCOC
Least
Squares
Ordinal
Classification
(LSOC)
Training LCOC-the proposed
case
• Target vector t is decomposed into a product of
– a known matrix S coding the target classes of the training
samples
– and a parameter vector z of elements containing the unknown
numerical values assigned to the K classes
• SSE becomes
T
SSE   Sz  Xb   Sz  Xb 
• where
Si , j
i 

1 if ω  ω j


0 otherwise
• SSE minimization revisited

{ζ, b}  arg min  Sz  Xb 
ζ ,b
T
 Sz  Xb  
 z2 
 A1 
z 


z  ζ  and ζ   3 


 AK 


z
 K 1 
LCOC
Least
Squares
Ordinal
Classification
(LSOC)
Training LCOC-the proposed
case
• Target vector t is decomposed into a product of
– a known matrix S coding the target classes of the training
samples
– and a parameter vector z of elements containing the unknown
numerical values assigned to the K classes
• SSE becomes
T
SSE   Sz A1,
Xb AK Sz
 Xb  does not affect
selection
• where
the classification results
Si , j
i 

1 if ω  ω j


0 otherwise
• SSE minimization revisited

{ζ, b}  arg min  Sz  Xb 
ζ ,b
T
 Sz  Xb  
 z2 
 A1 
z 


z  ζ  and ζ   3 


 AK 


z
 K 1 
Training LCOC-the proposed
case (cont.)
LCOC
Least
Squares
Ordinal
Classification
•
Since SSE is quadratic with respect to b and z, setting the partial
derivatives of SSE to zero results to
b = (XT X)-1 XTSz
Pζ  ST Xb  0
(LSOC)
P  diag ( p2 , p3 ,
, pK 1 ) , S  S2
SK1 
•
where
•
if the estimated z parameters were also employed by OLS the same
b parameters would have been estimated by both training methods
the estimated ζ values are in fact the intra-class average values of
the linear outputs
By substituting in the second equation the b vector given in the first
the system of linear equations becomes
•
•
S3
T
T
T
T
-1
T




P

S
HS
ζ

A
S
HS

A
S
HS
,
H

X(X
X)
X

 1
1
K
K
LCOC
Invariant Error Measure
• When the numerical values of the classes are not
fixed the classification results do not depend only
on the magnitude of the error, but also on the
distance among the classes
2
• Proposed measure rz that is also minimized by
LSOC training method
SSE
MSE
M
rz2 

SMD  D  2


 K 1 

 K  1 SSE

2
M  AK  A1 
2
monotone z values
2
z
r
• However unlike SSE
– Takes into account the distance between the classes
– is invariant to the selection of the bounding values A1
and AK since AK  A1
SSE
2
AK  A1
 
SSE

Exper
Experimental Evaluation
•
•
Both synthetic and real datasets are examined
Synthetic input vectors were produced by means of a random
number generator
– arbitrary linear model produces linear targets
– quantizing linear targets produces class targets
• quantization levels correspond to ordered classes
– initial error introduced into the linear model only by quantization
– the performance of the proposed training method was also assessed in
case of weaker linear dependency
• Additive White Gaussian Noise (AWGN) has been introduced into the linear
model before quantization
•
Real datasets involve visual inspection of seam specimen classified
to five grades of quality
– the critical assumption of linear dependency is unverified
• if not valid, the classification accuracy of the LSOC is anticipated to be as
poor as the one of OLS or even worse
– the produced results were also compared to those of Ordinal Logistic
Regression (OLR)
• OLR yields a good choice for comparison, since its model employs the same
number of parameters with those of LSOC
• however, OLR relies on computational methods to estimate these
parameters, whereas LSOC employs a closed form solution
Synthetic Datasets
Exper
•
Using a uniform random generator were artificially generated
–
–
–
–
1000 5-dimensional input vectors
the vectors were augment by adding an extra unit element
grouped into a design matrix of size 1000×6
6 arbitrary values were randomly selected as the weights of the linear
model
– the design matrix was multiplied with the weights’ vector and the
vector of the linear targets has been created
• consisting of 1000 values linearly dependent on the corresponding input
vectors
– the elements of the linear targets’ vector were positioned in
monotonically increasing order by rearranging accordingly the rows of
matrix
•
The 1st Synthetic Dataset contains 10 ordered classes with 100
input vectors in each class
– the 1000 input vectors were grouped together in hundreds
• the first 100 input vectors of matrix were classified to the first class, and so
on until the 10th class
•
The 2nd Synthetic dataset used the same design matrix and vector
of linear weights
– the 1st and the 2nd class were assigned with 300 input vectors each
– the 8 remaining classes were assigned with 50 vectors each
– the class targets of the input vectors are different for the second
dataset
Exper
Synthetic Datasets
•
Euclidian Distance
of z values from
the norm. centers
1st dataset
–
–
•
LSOC: 0.05
OLS: 0.32
2nd dataset
–
–
LSOC: 0.54
OLS: 0.90
Exper
Synthetic Datasets
•
Euclidian Distance
of z values from
the norm. centers
1st dataset
–
–
•
LSOC: 0.05
OLS: 0.32
2nd dataset
–
–
LSOC: 0.54
OLS: 0.90
Exper
Synthetic Datasets
•
Euclidian Distance
of z values from
the norm. centers
1st dataset
–
–
•
LSOC: 0.05
OLS: 0.32
2nd dataset
–
–
LSOC: 0.54
OLS: 0.90
Exper
Synthetic Datasets
•
Euclidian Distance
of z values from
the norm. centers
1st dataset
–
–
•
LSOC: 0.05
OLS: 0.32
2nd dataset
–
–
LSOC: 0.54
OLS: 0.90
Exper
Synthetic Datasets
•
Euclidian Distance
of z values from
the norm. centers
1st dataset
–
–
•
LSOC: 0.05
OLS: 0.32
2nd dataset
–
–
LSOC: 0.54
OLS: 0.90
Exper
Synthetic Datasets
R2 denotes the
coefficient of
determination
CA denotes
Classification
Accuracy
V denotes 10-fold
Cross-Validation
Exper
Synthetic Datasets
1st synthetic dataset
•
•
•
2nd synthetic dataset
AWGN has been introduced into the estimation of the linear targets
The Mean Distance (MD) among the classes has been calculated
the standard deviation of the added noise was set to be 5% of MD
to 100% of MD
– with a 5% of MD increment
•
Thus, for each dataset 20 different cases with increasing ratios
were constructed and tested
Exper
Real Datasets
• Image database of 325 seam specimens, belonging
to three different types of fabric
• Specimen size approximately 20×4 cm
• A committee of three experts labelled each
specimen by assigning a grade denoting the quality
of the seam
– 1 (worse) to 5 (best)
• For each specimen three ratings are assigned
– the median is selected as the actual grade
– the average agreement of each expert to the median
ratings has been 80.3% ±1.8%.
• 3 different feature sets all based on intensity
curves
– Roughness features
– FFT features
– Fractal features
• 4 different features in each set
Exper
ISO 7700
Standard
Textile Seam Quality Control
Exper
Pre-process
(a)
(a)
(b)
(b)
(c)
(c)
(d)
(d)
(e)
(e)
Exper
Intensity
Curves
Exper
Intensity
Curves
I (1) I (2) I (3) I (4)
Exper
γραμμή εικόνας
Intensity
Curves
I (1) I (2) I (3) I (4)
Exper
S (1)
S (2)
S (3)
S (4)
γραμμή εικόνας
Intensity
Curves
S
( j)
m
1

Nj
Nj
I
n 1
( j)
m ,n
Mean intensity values (column-wise)
Exper
Feature
Extraction
Roughness Features
• Moving Average filter
MVm( j )
1

W
mW 2

k  mW 2
Sk( j )
• Intensity Deviation
1
Rj 
M
M
( j)
( j)
S

MV
 m
m
m 1
FFT Features
Exper
Feature
Extraction
•
•
•
Using the first 40 FFT coefficients produced from each intensity curve
Applying averaging using different window centers and sizes
Selecting the window settings that present the highest correlation with the
quality grades
Exper
Feature
Extraction
Fractal Features
• Modified Pixel Dilation method (MPD) is applied to
an intensity curve estimating its fractal dimensions
– Each intensity curve is treated as binary image
– n successive dilation operations are performed
– The area S(n) occupied by the produced curves and the
area E(n) occupied by a single pixel that has been dilated
by the same morphological operator are calculated for
different values of n
– The relationship among the fractal dimension D, S(n), and
E(n), is given by
D

S (n)
 rE (n) 2
E (n)
Exper
Roughness Results
• LSOC improves results of the naïve case
– outperforms OLS if >20 training samples
• LSOC generalize better than OLR in limited training set
– outperforms OLR if <45 training samples
Exper
FFT Results
• Similar to RF results
• LSOC’s performance is even closer to OLR’s
– indicates stronger linear relationship between FFT
features and quality grades
Exper
Fractal Results
•
•
•
•
Different from RF or FFT results
Both metric methods are outperformed by OLR even for limited training
sets
LSOC’s performance is slightly worse than OLS’s
Indicate weak linear relationship between Fractal features and quality
grades
Exper
Summarizing Results
•
In case of the synthetic datasets
– the linear dependency between feature vectors and class values is
established
– proposed method produces significantly better results than the naïve
approach
– the difference in the performance is even greater in case of the 2nd
synthetic dataset, where the intervals between the classes are less
uniform
•
In case of real datasets
– OLR presents the highest performance for all feature sets
• provided a large number of training samples is available
– LSOC presents, for almost every case, higher classification accuracy
than the one using OLS
– If the linear relation between the inputs and the outputs is not very
strong, the proposed method is not likely to outperform the naïve
approach
• In such cases, however the performance of both classifiers is very poor
anyway, thus other approaches, like OLR, should be considered
Concl
Conclusion
•
A common strategy for selecting an appropriate classification
method for a specific task is
– start with the simplest one and check its performance
– if the performance is not adequate more complex methods are
considered
•
The OLS regression approach is by far the simplest of all ordinal
classification methods
– presenting computational efficiency
– ease of implementation
•
In the naïve case arbitrary numerical values are assigned to the
ordered classes
– inappropriate numerical mapping can result to poor classification
performance
•
LSOC, estimates an optimal mapping using a novel goodness of fit
measure
– like in OLS a linear model is employed
– the model’s parameters derive through a closed form expression
– the computational efficiency of the naïve approach is retained
Concl
Conclusion (cont.)
•
In the experimental evaluation it was demonstrated that if LSOC is
used instead of OLS the classification accuracy can be significantly
increased
– the accuracy of 76 % and 39 % presented by OLS in case of the 1st and
2nd synthetic dataset was increased to 93 % and 83 %, respectively, in
case of LSOC
– a similar trend was present both when Gaussian noise was added to the
synthetic datasets and in case of real datasets.
•
LSOC was also compared to OLR
– more sophisticated method explicitly designed to handle ordinal data
– even though OLR achieves higher accuracy when a large number of
training samples is employed, it is outperformed by LSOC when this
number decreases
– LSOC can be an attractive choice in case a limited number of training
samples are available
– due to its computational simplicity LSOC is also an attractive choice if
speed of calculations is an issue.
•
In future work the performance of LSOC can be further investigated
in case non-linear kernels are applied to the original input vectors
– transferring them in a higher-dimensional space where linearity holds
Concl
“Remember that all models are wrong;
the practical question is how wrong do
they have to be to not be useful.”
George E.P. Box