Transcript Statistics 400 - Lecture 2
Statistics 400 - Lecture 23
Last Day: Regression Today: Finish Regression, Test for Independence (Section 13.4) Suggested problems: 13.21, 13.23
Computer Output
Will not normally compute regression line, standard errors, … by hand Key will be identifying what computer is giving you
M o
SPSS Example
d
.
S 1 U
e
t A
l
d
S
.
u
E
m
r t r
m
o d r
a
e l e d i c
r y
t o r s : ( C o n s t a n t ) 1 3
C
n C -
A
.
3
o
S o S
N
u 8
V A
a f e e o g s t e r i a e d l l s u s a i l e d p i e c t n o d r e
e f f n t s
t a e t a z f n .
d e a d c r i e C r o A d i n i t n D e s l s t p e n d a d n e t o s n ) n t n : t ( V C V a a o r r i n i s a t b a b a le : n : t D ) , D E R E A A A T D T H H S , S R A D
What is the Coefficients Table?
What is the Model Summary?
What is the ANOVA Table
Back to Probability
The probability of an event, A , occurring can often be modified after observing whether or not another event, B , has taken place Example: An urn contains 2 green balls and 3 red balls. Suppose 2 balls are selected at random one after another without replacement from the urn.
Find P(Green ball appears on the first draw) Find P(Green ball appears on the second draw)
Conditional Probability
The Conditional Probability of A given B :
P
(
A
|
B
)
P
(
A
and
B
)
P
(
B
)
Example: An urn contains 2 green balls and 3 red balls. Suppose 2 balls are selected at random one after another without replacement from the urn.
A={Green ball appears on the second draw} B= {Green ball appears on the first draw} Find P(A|B) and P(A c |B)
Example:
Records of student patients at a dentist’s office concerning fear of visiting the dentist suggest the following proportions
Fear Dentist Do Not Fear Dentist School Level Elementary Middle 0.12
0.28
0.08
0.25
Let A={Fears Dentist}; B={Middle School} Find P(A|B)
High 0.05
0.22
Conditional Probability and Independence
If fearing the dentist does not depend on age or school level what would we expect the probability distribution in the previous example to look like?
What does this imply about P(A|B)?
If A and B are independent, what form should the conditional probability take?
Summarizing Bivariate Categorical Data
Have studied bivariate continuous data (regression) Often have two (or more) categorical measurements taken on the same sampling unit Data usually summarized in 2-way tables Often called contingency tables
Test for Independence
Situation: We draw ONE random sample of predetermined size and record 2 categorical measurements Because we do not know in advance how many sampled units will fall into each category, neither the column totals nor the row totals are fixed
Example:
Survey conducted by sampling 400 people who were questioned regarding union membership and attitude towards decreased spending on social programs Union Non-Union Total Support 112 84 196 Indifferent 36 68 104 Opposed 28 72 100 Total 176 224 400 Would like to see if the distribution of union membership is independent of support for social programs
If the two distributions are independent, what does that say about the probability of a randomly selected individual falling into a particular category What would the expected count be for each cell?
What test statistic could we use?
Formal Test
Hypotheses: Test Statistic: P-Value:
Spurious Dependence
Consider admissions from a fictional university by gender
Male Female Total Admit
490 280 770
Deny
210 220 430
Male Female Admit
0.70
0.56
Deny
0.30
0.44
Is there evidence of discrimination?
Consider same data, separated by schools applied to: Business School:
Male Female Admit
480 180
Deny
120 20
Male Female Admit
0.80
0.90
Deny
0.20
0.10
Law School: Male Female Admit
10 100
Deny
90 200
Male Female Admit
0.10
0.33
Deny
0.90
0.67
Simpson’s Paradox: Reversal of comparison due to aggregation Contradiction of initial finding because of presence of a lurking variable