Statistics 400 - Lecture 2

Download Report

Transcript Statistics 400 - Lecture 2

Statistics 400 - Lecture 23

 Last Day: Regression  Today: Finish Regression, Test for Independence (Section 13.4)  Suggested problems: 13.21, 13.23

Computer Output

 Will not normally compute regression line, standard errors, … by hand  Key will be identifying what computer is giving you

M o

SPSS Example

d

.

S 1 U

e

t A

l

d

S

.

u

E

m

r t r

m

o d r

a

e l e d i c

r y

t o r s : ( C o n s t a n t ) 1 3

C

n C -

A

.

3

o

S o S

N

u 8

V A

a f e e o g s t e r i a e d l l s u s a i l e d p i e c t n o d r e

e f f n t s

t a e t a z f n .

d e a d c r i e C r o A d i n i t n D e s l s t p e n d a d n e t o s n ) n t n : t ( V C V a a o r r i n i s a t b a b a le : n : t D ) , D E R E A A A T D T H H S , S R A D

 What is the Coefficients Table?

 What is the Model Summary?

 What is the ANOVA Table

Back to Probability

 The probability of an event, A , occurring can often be modified after observing whether or not another event, B , has taken place  Example: An urn contains 2 green balls and 3 red balls. Suppose 2 balls are selected at random one after another without replacement from the urn.

 Find P(Green ball appears on the first draw)  Find P(Green ball appears on the second draw)

Conditional Probability

 The Conditional Probability of A given B :

P

(

A

|

B

) 

P

(

A

and

B

)

P

(

B

)

Example: An urn contains 2 green balls and 3 red balls. Suppose 2 balls are selected at random one after another without replacement from the urn.

 A={Green ball appears on the second draw}  B= {Green ball appears on the first draw}  Find P(A|B) and P(A c |B)

Example:

 Records of student patients at a dentist’s office concerning fear of visiting the dentist suggest the following proportions

Fear Dentist Do Not Fear Dentist School Level Elementary Middle 0.12

0.28

0.08

0.25

 Let A={Fears Dentist}; B={Middle School}  Find P(A|B)

High 0.05

0.22

Conditional Probability and Independence

 If fearing the dentist does not depend on age or school level what would we expect the probability distribution in the previous example to look like?

 What does this imply about P(A|B)?

 If A and B are independent, what form should the conditional probability take?

Summarizing Bivariate Categorical Data

 Have studied bivariate continuous data (regression)  Often have two (or more) categorical measurements taken on the same sampling unit  Data usually summarized in 2-way tables  Often called contingency tables

Test for Independence

Situation: We draw ONE random sample of predetermined size and record 2 categorical measurements  Because we do not know in advance how many sampled units will fall into each category, neither the column totals nor the row totals are fixed

Example:

 Survey conducted by sampling 400 people who were questioned regarding union membership and attitude towards decreased spending on social programs Union Non-Union Total Support 112 84 196 Indifferent 36 68 104 Opposed 28 72 100 Total 176 224 400  Would like to see if the distribution of union membership is independent of support for social programs

 If the two distributions are independent, what does that say about the probability of a randomly selected individual falling into a particular category  What would the expected count be for each cell?

 What test statistic could we use?

Formal Test

 Hypotheses:  Test Statistic:  P-Value:

Spurious Dependence

 Consider admissions from a fictional university by gender

Male Female Total Admit

490 280 770

Deny

210 220 430

Male Female Admit

0.70

0.56

Deny

0.30

0.44

 Is there evidence of discrimination?

 Consider same data, separated by schools applied to:  Business School:

Male Female Admit

480 180

Deny

120 20

Male Female Admit

0.80

0.90

Deny

0.20

0.10

Law School: Male Female Admit

10 100

Deny

90 200

Male Female Admit

0.10

0.33

Deny

0.90

0.67

 Simpson’s Paradox: Reversal of comparison due to aggregation  Contradiction of initial finding because of presence of a lurking variable