CHAPTER 15 EXPLORING RELATIONSHIPS

Download Report

Transcript CHAPTER 15 EXPLORING RELATIONSHIPS

CHAPTER 15
15.1
EXPLORING RELATIONSHIPS
INTRODUCTION
EXPLANATORY VARIABLE

The most common data analysis situations
in the business arena
Response variable is measured
Explanatory variable is either attribute or
measured
15.2 COMMON DATA-ANALYSIS
METHODOLOGY

STEP 1 Framework for defining a Data-Analysis
 A clear idea of what is meant by a connection between
the response variable and the explanatory variable.

STEP 2 Initial Data Analysis (IDA)
 To use some simple sample descriptive statistics to
have a first look at the nature of the link between the
variables
 Strong
evidence to support a link
 No evidence of a link
 Inconclusive and further more sophisticated data analysis is
required.

STEP 3 Further Data Analysis (FDA)
 The sample evidence is consistent with there being no
link/connection/relationship between the response
variable and the explanatory variable.
 The sample evidence is consistent with there being a
link/connection/relationship between the response
variable and the explanatory variable

STEP 4 Describe Relationship
 How to undertake and interpret the I.D.A.
 How to undertake and interpret the Further Data
Analysis.
Yes-- there is evidence
of a relationship, in
which case the link
needs to be described.
No– No evidence of a relationship

Example 1
A university investigates the salary of its
graduates five years after graduating

Example 2
 CREDIT Scenario
15.3 EXPLORING RELATIONSHIPS 1

MEASURED RESPONSE VARIABLE ---ATTRIBUTE EXPLANATORY VARIABLE

Definition
There is a relationship between a measured
response and an attribute explanatory variable if
the average value of the response is dependent
on the level of the attribute explanatory variable.

There is no link
 Given a measured response and an attribute
explanatory variable with two levels, 1 & 2. If the
statistical distribution of the response variable for
attribute level 1 and attribute level 2 are exactly the
same then the level of the attribute variable has no
influence on the value response

There is a link
 Given a measured response and an attribute explanatory variable
with two levels, 1 & 2. If the statistical distribution of the response
variable for attribute level 1 and attribute level 2 have different
means then the level of the attribute variable does influence the
response variable.

Illustrative Example
Response Variable: Amount Spent on Clothes
per month
Attribute Explanatory Variable: Gender
(Male/Female)

Irrespective of the shape/structure of the
distributions,
Equal means for no connection
Unequal means for a connection
15.4 WORKING WITH SAMPLE DATA

The Initial Data Analysis (I.D.A)
Calculating the mean value of the response
variable for each level of the explanatory
attribute variable
Constructing the boxplots for each level of the
explanatory attribute variable

Outcome 1
Response Variable: Amount Spent on Clothes
per month
Attribute Explanatory Variable: Gender
(Male/Female)
Mean
 Attribute
variable level 1 Mean = 50
 Attribute variable level 2 Mean = 50
Boxplot

Interpretation:
The amount spent on clothes is not gender
specific, i.e. gender is not a factor that
influences spending on clothes.
In this situation there is clearly no
connection/relationship.

Outcome 2
Response Variable: Amount Spent on Clothes
per month
Attribute Explanatory Variable: Gender
(Male/Female)
Mean
 Attribute
variable level 1 Mean = 25
 Attribute variable level 2 Mean = 75
Boxplot

Interpretation:
The amount spent on clothes is gender specific,
i.e. gender is a factor that influences spending
on clothes. Females spend on average £50 per
month more on clothes than Males.
 In this situation there is clearly a
connection/relationship

Outcome 3
Response Variable: Amount Spent on Clothes
per month
Attribute Explanatory Variable: Gender
(Male/Female)
Mean
 Attribute
variable level 1 Mean = 45
 Attribute variable level 2 Mean = 55
Boxplot

Interpretation
There is not enough evidence to form a clear
judgement
In this situation the evidence is insufficient to
draw a conclusion and further data analysis is
required.
15.5 THE INITIAL DATA ANALYSIS
AND MINITAB

To investigate the relationship between:
'Amount Borrowed on Credit' and 'Does the
customer own their own house?' (Coded 0=Yes;
1=No)
CREDIT DATA is loaded
Command: Stat-Display Describes Statistics
(Following graphs)
Variable OWNER N Mean StDev Minimum
Q1
Median Q3
Maximum
CREDIT 0
69 229.04 59.95 111.00 182.00 228.00 271.00 387.00
1
31 265.0 63.5 146.0
225.0
248.0
323.0 388.0
Histogram (with Normal Curve) of CREDIT by OWNER
120 160 200 240 280 320 360 400
0
12
1
Frequency
10
229.0
59.95
69
1
Mean
StDev
N
8
6
4
2
0
0
Mean
StDev
N
120 160 200 240 280 320 360 400
CREDIT
Panel variable: OWNER
265.0
63.47
31
Histogram of CREDIT
Normal
OWNER
0
1
9
8
Mean StDev N
229.0 59.95 69
265.0 63.47 31
Frequency
7
6
5
4
3
2
1
0
120
160
200
240
280
CREDIT
320
360
400
Boxplot of CREDIT vs OWNER
OWNER
0
1
100
150
200
250
CREDIT
300
350
400

To investigate the connection/link between
The response variable 'Amount Borrowed on
Credit' and attribute explanatory variable
'Region' which has five levels is illustrated
below:
Variable REGION N
CREDIT 1
17
2
20
3
22
4
24
5
17
Mean
254.5
248.5
281.1
208.7
207.6
StDev
54.2
54.4
61.0
57.5
57.0
Minimum
164.0
160.0
192.0
111.0
135.0
Q1
225.0
206.5
224.0
168.0
163.5
Median Q3
Maximum
247.0 284.0 387.0
249.5 302.5 328.0
276.0 342.0 388.0
219.5 249.3 317.0
191.0 259.0 309.0
Histogram (with Normal Curve) of CREDIT by REGION
120 180 240 300 360 420
1
2
3
6.0
4.5
Frequency
3.0
1.5
4
5
6.0
4.5
1
Mean
StDev
N
2
Mean
StDev
N
0.0
120 180 240 300 360 420
254.5
54.18
17
248.5
54.41
20
3
Mean
StDev
N
281.1
61.01
22
4
3.0
Mean
StDev
N
1.5
208.7
57.48
24
5
0.0
120 180 240 300 360 420
CREDIT
Panel variable: REGION
Mean
StDev
N
207.6
56.98
17
Histogram of CREDIT
Normal
3.5
REGION
1
2
3
4
5
3.0
Frequency
2.5
Mean StDev N
254.5 54.18 17
248.5 54.41 20
281.1 61.01 22
208.7 57.48 24
207.6 56.98 17
2.0
1.5
1.0
0.5
0.0
120
180
240
CREDIT
300
360
420
Boxplot of CREDIT vs REGION
1
REGION
2
3
4
5
100
150
200
250
CREDIT
300
350
400
15.6

SUMMARY
The Data Analysis Situations
Four data analysis situations
Only discuss a measured response variable

Definition of connection/link/Relationship
The formal definition of no link is:
 If the average value of the response variable is
independent of the level of the attribute explanatory
variable then the response variable and the attribute
explanatory variable are independent (not
connected).
The formal definition of link is:
 If the average value of the response variable is
dependent on the level of the attribute explanatory
variable then the attribute explanatory variable
influences the value of the response variable, so the
response variable and the attribute explanatory
variable are connected.

The Data Analysis Methodology (Figure 2 )
The Initial Data Analysis (I.D.A):
The Further Data Analysis (FDA), if required:
Describe the connection if one exists.