Transcript Document

The Centre for Multilevel Modelling:

Research funded by ESRC (UK) at the Institute of Education, London Web pages:

Introduction; software; articles; references; etc

http://mlwin.com/

Provides access to general information about multilevel modelling and MlwiN

______________________________________ Multilevel Newsletter (electronic version send email request to receive this to [email protected]

free):

________________________________________

Email discussion group: www.jiscmail.ac.uk/multilevel/

Multilevel (hierarchical) data structures 

Students nested within schools



Measurement occasions nested within subjects



(Multivariate) measurements nested within subjects



Survival times nested within clinics



Students nested within cells

cross-classified

by school and neighbourhood



Students changing schools over course of a longitudinal study, so having

multiple membership

of schools.

peril incorrect standard errors: loss of important information about between unit variation. Models need to incorporate hierarchical structure explicitly. Models referred to sometimes as 'mixed models', random coefficient models', 'hierarchical models'. Variance component models are a special case.

Classifying structures

Simple hierarchy

School Pupil

Cross classifications

School Pupil

Multiple membership

Schools Neighbourhood Pupil

Building more complex structures

Primary Secondary Pupil School Teacher Pupil

The 2-level Model Level 2 unit (j): School Subject Cinic Level 1 unit (i): Student Occasion Patient Example: Students grouped within schools - exam results

73 children in one school - level 1 variation

Exam Score Prior reading score y i response

intercept

 

i coeff explan

i residual

Multilevel structure - several schools:

Exam score Prior reading score

Full model: (model

purely random

variables in red).

y ij

 

j

 

x ij

e ij

Traditional (regression) models estimate:  

j

r(

e ij

)

 

e

2 To generalise to

population

of level 2 units we rewrite  

j

 

u j

to give

y i j x i j

u j

Level 2 residual 

e i j

Level 1 residual

u j

u

2

)

typically Normal. Also ‘non-independence’:  (

y y ij kj

)  

u

2  2 

u

e

2 Intra - unit correlation – more generally Variance Partition coefficient (VPC)

Extend to a

random coefficient model

- e.g.

y ij

   0 (   0   1

x ij u

0 

j

)  (

u

0

j

(  1  

u

1

j

)

x ij u

1

j x ij

 

e ij

)

e ij

var(

u

0

j

)

  2

u

0

u

1

j

)

  2

u

1

, cov(

u u

0

j

1

j

)

 

u

01 

Note more general notation as used in MLwiN

y

ij

 0

ij

  1

j

 0

ij

x

0

ij

    0  1 

u

0

j

u

1

j

 1

j

x

1

ij

e

0

ij

x

0

ij

 1

Residuals:

u

0

j

,

u

1

j

are level 2 residuals We obtain sample estimates that have standard errors This allows us to provide confidence intervals

: Note: a standard confidence interval will allow us to judge a residual against a value of zero – e.g. is that school’s effect significantly different from the overall mean. An ‘overlap interval’ will allow us to compare 2 schools in terms of whether their intervals overlap. We use the latter below. MlwiN example using tutorial examination data

Value added: illustration for two schools

Key Stage 2

T2 T1

School A All Schools School B

S1

Key Stage 1

S2 A random coefficient model: value added scores are estimated residuals.

Applying a multilevel model to rankings of schools: Start with unadjusted ranking and then apply value added adjustment and uncertainty estimation

DO SCHOOLS DIFFER?

We can plot the school means for the tutorial data against their ranks - a graphical league table : Our response variable has bean normalised. Therefore the difference between the highest school mean and the lowest school mean is 2 standard deviations. From this graph it appears that different schools do have very different effects.

ADDING UNCERTAINTY INTERVALS AROUND THE ESTIMATES

We now put (95%) overlap confidence intervals around our estimates of the means . : Even taking account of sampling error there are large statistically significant differences between the school means.

The school with the lowest mean( school B ) is highlighted in blue and the school with the highest mean( school A ) is highlighted in red

WHAT ABOUT INTAKE ABILITY?

We also have data on tests children took when they entered secondary school year 7. That is we have a measure of intake ability. The lower graph plots pupil level attainment against pupil level intake score. There are 4000 pupils and therefore 4000 points on the graph. Pupils from the school A are picked out in red and pupils from the school B are picked out in blue. We see that there are more pupils in school A with high intake scores. School A attracted more able pupils than school B which must contribute to its higher outcome mean. Rather than looking at raw(unadjusted) school means we should be adjusting our model for the school ’ s intake. We will then be looking at progress pupils make while attending a school. This is a more meaningful measure of school effectiveness.

ADJUSTING FOR INTAKE ABILITY

We can adjust for intake ability by regressing pupil attainment on pupil intake score. The model becomes multilevel because we allow each of our 65 schools to depart(be raised or lowered) from the overall regression line. These school level departures are known as school level residuals and can be thought of as a measure of the effect of the school. The results of the model are illustrated in the graph opposite. The central green line is the regression based on all pupils from all schools, from which the 65 school lines depart. The equation of this line is

Predicted attainment

= 0.092 + 0.566*

intake

That is on average an increase of 1 unit of intake score results in an increase 0.566 units in outcome attainment. We can see that even adjusting for intake score School A has the largest positive residual, its line is at the top, and school B has the largest negative residual, its line being at the bottom. Can we now say that having adjusted for pupil intake ability School A is more effective than school B ?

ADJUSTING FOR INTAKE ABILITY – A CLOSER INSPECTION

The graphs below reveal some interesting patterns: In the top panel we see that the school residual(effect) for school A is still statistically different from school B . The set of school lines in the middle graph must be parallel because the model fitted constructs a schools line by adding that school ’ s residual to the average( green ) line ’ s intercept. We could allow the schools lines to have different slopes. The bottom graph suggests that lines with different slopes, certainly in the case of school A and school B , would be more realistic. Eyeballing the graph the points for school B suggest a line with a flatter slope than for school A.

ALLOWING DIFFERENT SLOPES FOR THE SCHOOLS LINES

If we allow every school to depart from the overall average line in terms of both its intercept and slope we get the following : Every school now has an intercept residual and a slope residual. The corresponding 65 points are plotted in the top panel. We see that school A has the highest slope and the highest intercept. We therefore expect school A ‘ s line to be the steepest line and to cross the y-axis at the point x=0 at a higher point than all the other schools ’ lines. If we look at the middle panel we can see this is the case. Conversely, school B has the lowest intercept residual and a very low slope residual, which combine to create a flat line located at the bottom of the set of school ’ s lines. Consider again the question is school A more effective than school school B ? The extent of the difference between the two schools depends on pupil ’ s intake scores. For pupils with low intake scores the difference is small. For pupils with high intake scores the difference is large.

ONCE MORE WITH CONFIDENCE INTERVALS

Below is a graph with just the lines for schools A and B along with their associated confidence intervals (now conventional 95% intervals).

Remember we are comparing in unadjusted terms the top and bottom schools. Once we correct for intake and allow schools to have their own intercepts and slopes we find that we can not definitively claim that school A is more effective than school B. For low intake ability pupils statistically there is no difference between the two schools. Schools are differentially effective for different types of pupils. Here we have only explored differential school effectiveness in terms of intake ability. Schools can also be differentially effective with respect to other pupil characteristics. For example, gender, ethnicity and socio-economic status. Multilevel modelling provides a framework for describing and explaining these between school differences.

CONTEXTUAL EFFECTS

Another reason why multilevel modelling is attractive to social science researchers is that it is useful for exploring interactions between people and the social contexts they are situated in. For example, do low ability pupils fare better when they are educated alongside higher ability pupils or are they discouraged and fare worse? We can categorise our 65 schools into 3 groups with respect to the intake scores of their pupils. We do the following. •for each school calculate the mean intake score ability of all it pupils •rank these 65 means •assign schools in the bottom quartile to one group, the middle 50% to a second group and the top 25% to a high group. We now have three types of schools low, middle and high which correspond to low ability, middle ability and high ability schools.

We can include the school ability contextual variable in a multilevel model and allow it to interact with the pupil level intake ability. This tells us how pupils across the spectrum of pupil level intake ability are effected by being educated amongst low, middle or high ability peers. The graph on the next slide shows the results for low versus high ability schools.

THE CONTEXTUAL EFFECT OF PEER GROUP ABILITY

Consider first a high ability pupil. If we look at value of 2.6 on the x axis this corresponds to a pupil who on entry to secondary school has a score of 2.6 standard deviations above the mean. If that pupil attends a school where her peers are on average low ability then the pupils predicted outcome attainment is 1.0 standard deviation above the average attainment; this is the height of the blue line at x value 2.6. However, if that same pupil attended a high ability school the model predicts that her outcome attainment would be 1.8 standard deviations above the mean outcome attainment; the height of the green line at x = 2.6. For high ability pupils the model suggests there is a large positive effect of being educated amongst high ability pupils. The difference between the green and blue lines represents the effect of being in a high ability group. As we move down the spectrum of pupil intake ability(leftwards along the the x axis) we see this effect lessening. For values x less than – 1.8 the blue line is higher. This means that very low ability pupils(x < -1.8) actually fare better when they are situated in a low ability school than in a high ability school.

Understanding the sources of differential parenting: the role of child and family level effects

Mapping multilevel terminology to psychological terminology • Level 2 : Family, shared environment Variables : family ses, marital problems • Level 1 : Child, non-shared environment, child specific Variables : age, sex, temperament

Background • Recent studies in developmental psychology and behavioural genetics emphasise

non-shared

environment is much more important in explaining children’s adjustment than

shared

environment has led to a focus on non-shared environment.(Plomin

et al,

1994; Turkheimer&Waldron, 2000) • Has this meant that we have ignored the role of the shared family context both empirically and conceptually?

Background • One key aspect of the non-shared environment that has been investigated is differential parental treatment of siblings.

• Differential treatment predicts differences in sibling adjustment • What are the sources of differential treatment?

• Child specific/non-shared: age, temperament, biological relatedness • Can family level shared environmental factors influence differential treatment?

The Stress/Resources Hypothesis Do family contexts(shared environment) increase or decrease the extent to which children within the same family are treated differently?

“Parents have a finite amount of resources in terms of time, attention, patience and support to give their children. In families in which most of these resources are devoted to coping with economic stress, depression and/or marital conflict, parents may become less consciously or intentionally equitable and more driven by preferences or child characteristics in their childrearing efforts”. Henderson

et al

1996.

This is the hypothesis we wish to test. We operationalise the

stress/resources hypothesis

using four contextual variables: socioeconomic status, single parenthood, large family size, and marital conflict

How differential parental treatment has been analysed Previous analyses, in the literature exploring the sources of differential parental treatment ask mother to rate two siblings in terms of the treatment(positive or negative) they give to each child.

The difference between these two treatment scores is then analysed.

This approach has several major limitations…

The sibling pair difference difference model, for exploring determinants of differential parenting (

y

1

i

y

2

i

)   0   1

x

1

i

...

Where

y

1

i

and

y

2

i

are parental ratings for siblings 1 and 2 in family

I x

1

i

is a family level variable for example family ses

Problems

• One measurement per family makes it impossible to separate shared and non-shared random effects.

•All information about magnitude of response is lost (2,4) are the same as (22,24) •It is not possible to introduce level 1(non-shared) variables since the data has been aggregated to level 2.

•Family sizes larger than two cannot be handled.

y ij

 With a multilevel model…  0   1

x

1

ij

  2

x

2

j

u j

e ij u j

~

N

( 0 , 

u

2 )

e ij

~

N

( 0 , 

e

2 ) Where

y ij

is the

j

’th mothers rating of her treatment of her

i

’th child

x

1

ij

are child level(non-shared variables),

x

2

j

are child level(shared variables)

u j

and

e ij

are family and child(shared and non-shared environment) random effects.

parenting 

e

Advantages of the multilevel approach •Can handle more than two kids per family •Unconfounds family and child allowing estimation of family and child level fixed and random effects •Can model parenting level and differential parenting in the same model .

Overall Survey Design • National Longitudinal Survey of Children and Youth (NLSCY) • Statistics Canada Survey, representative sample of children across the provinces • Nested design includes up to 4 children per family • PMK respondent • 4-11 year old children • Criteria: another sibling in the age range, be living with at least one biological parent, 4 years of age or older • 8, 474 children • 3, 860 families • 4 child =60, 3 child=630, 2 child=3157

Measures of parental treatment of child Derived form factor analyses..

• PMK report of positive parenting: frequency of praise of child, talk or play focusing on child, activities enjoyed together  =.81

• PMK report of negative parenting: frequency of disapproval, annoyance, anger, mood related punishment  =.71

• Will talk today about positive parenting PMK is parent most known to the child.

Child specific factors • Age • Gender • Child position in family • Negative emotionality • Biological relatedness to father and mother Family context factors • Socioeconomic status • Family size • Single parent status • Marital dissatisfaction

Model 1: Null Model

y

ij

  0 

u

j

e

ij

u

j

~

N

( 0 , 

u

2 )

e

ij

 ˆ 0 ~

N

( 0 , 

e

2 )  12 .

51 ( 0 .

04 )  ˆ

u

2  5 .

13 ( 0 .

17 )  ˆ

e

2  3 .

8 ( 0 .

08 ) The base line estimate of differential parenting is 3.8. We can now add further shared and non-shared explanatory variables and judge their effect on differential parenting by the reduction in the level 1 variance.

Model 2 : expanded model

y ij

  0

j

  5

notBioF ij

 1

j age ij

    2

age ij

2 6

oldestSib ij

   3

girl ij

  7

midSib ij

 4

notBioM ij

   7

hses j

  8

famsize j

11

mixedGende r j

    9

loneParent

12

maritalprb j

  13

j

  10

allGirls famsize

*

age

 0

j

  0 

u

0

j

 1

j

  1

j

u

1

j j

   

u

0

u

1

j j

  

e ij

~ ~

N

( 0 , 

u

)

N

( 0 , 

e

2 )  

u

0

u

1

j j

  ~

N

(0, 

u

) means that the the intercept and slope at level 2 have Normal distributions each with a variance and a covariance between them.

positive parenting Child level predictors • Strongest predictor of positive parenting is age. Younger siblings get more attention. This relationship is moderated by family membership.

• Non-bio mother and Non_bio father reduce positive parenting • Oldest sibling > youngest sibling > middle siblings Family level predictors • Household SES increases positive parenting • Marital dissatisfaction, increasing family size, mixed or all girl sib ships all decrease positive parenting • Lone parenthood has no effect.

Differential parenting Modelling age reduced the level 1 variance (our measure of differential parenting) from 3.8 to 2.3, a reduction of 40%. Other explanatory variables both child specific and family(shared environment) provide no significant reduction in the level 1 variation.

Does this mean that there is no evidence to support the

stress/resources

hypothesis.

Testing the stress/resource hypothesis • The mean and the variance are modelled simultaneously. So far we have modelled the mean in terms of shared environment but

not

the variance.

• We can elaborate model 2 by allowing the level 1 variance to be a function of the family level variables household socioeconomic status, large family size, and marital conflict. That is  2

ej

w

0  2

w

1

hses

j

w

2

hses

2

j

 2

w

3

marital

j

 2

w

4

maritalprb

.

ses

j

0 4  1 .

84 ( 0 .

1 )   0 .

29 ( 0 .

13 ) 1   0 .

23 ( 0 .

04 ) 5   2

w

5

familysize

0 .

11 ( 0 .

05 ) 2

j

 0 .

17 ( 0 .

07 ) Reduction in the

deviance

with 7df is 78 so highly significant.

Graphically … 5 2 1 4 3 family size = 2, no marital problems family size = 2, marital problems family size > 2, marital problems family size > 2, no marital problems -2.0

-1.5

-1.0

-0.5

0.0

0.5

household ses 1.0

1.5

2.0

Conclusion • We have found strong support for the stress/resources hypothesis. That is although differential parenting is a child specific factor that drives differential adjustment, differential parenting itself is influenced by family as well as child specific factors.

• This challenges the current tendency in developmental psychology and behavioural genetics to focus on child specific factors.

• Multilevel models fitting complex level 1 variation need to be employed to uncover these relationships.

Next steps This analysis was based on the first wave of the National Longitudinal Survey of Children and Youth. We now have a further wave of data. This allows to model adjustment outcomes at wave 2 in terms of : •Differential parenting at wave 1(

e ij

) •Family mean positive parenting at wave 1(  0 +

u j

) •Other family and child specific explanatory variables. •Which will allow us to asses the relative importance of differential parenting on subsequent adjustment.

Even more complex structures ALSPAC P. Teacher Primary P. Cohort Area Secondary S. Cohort S. Teacher Pupil Class Projects Pupil Group

Analysis of data on segregation/diversity

•  

A modelling approach assumes observed data reflect underlying structure of interest

 

Response is proportion of disadvantaged (FSM) children in a school

 

Intake ‘cohorts’ of children ‘nested’ within schools, ‘nested’ within areas

 

Does underlying variation between schools and between areas change over time?

Can be studied via a multilevel model as follows

log[

ij

p

ij

~ /(1

 

ij

)]

ij

,

ij

) logit(

ij

)

(

X

)

ij

v

j

u

ij

v

j

~

N

(0,

v

2

),

u

ij

~

N

(0,

u

2

)

p ij

area , is observed proportion at any one time in

i

-th school in

j

-th 

ij

is underlying probability which is decomposed into a school effect (

u

ij

) and an area effect (

v

ij

variation between schools ( 

u

2 ). Interest lies in the ) and areas ( 

v

2 ). If variation Normal then this is a complete summary of the data and avoids arbitrary index definitions.

logit(

p ij

)

We see that this distribution is close to Normal and this is confirmed by an analysis of residuals. 

Results follow.

Note that we do not have intake year data, only for whole school, so successive years overlap

.

Table 1. Variance estimates (standard errors) for each year

Year

1994 1995 1996

Between school Between LEA

0.625 (0.016)

Total

0.491 (0.066) 1.116 0.636 (0.016) 0.650 (0.016) 0.522 (0.072) 1.158 0.503 (0.064) 1.153 1997 1998 0.660 (0.017) 0.685 (0.017) 0.498 (0.069) 1.158 0.506 (0.068) 1.191 1999 0.691 (0.017) 0.506 (0.068) 1.197 Note 11% increase for between-school from 1994 to 1999 We can compare 1994 and 1999 formally since different pupils: Variances from joint analysis of 1994 and 1999 only schools LEAs 1994 1999 correlation Test 1994 - 1999 0.636 (0.016) 0.707 (0.017) 0.95 P<0.001 0.490 (0.066) 0.508 (0.069) 0.98 P>0.10



Analysis of % of schools within LEA having control over admissions. Shows no overall effect on proportion with FSM but increasing between-school variance with increasing % control over admissions. Figure 2.

Between-school variance for selective and non-selective LEAs in 1994 and 1999 by percentage admission controlling schools.

1999 selective 1994 selective 1999 not selective 1994 not selective

Extensions and issues:

• Cross classifications: Primary and secondary school.

• Multiple membership: mobility among schools.

• Multivariate data • Factor models