Transcript PROJEKT II

Advanced statistics for master students Loglinear models

Loglinear analysis

-method for analysis of two and more dimensional contingency tables Other approaches for contingency tables (esp. For two dimensional) 1) chi-square test of independence and adjusted residuals (short repetition) 2)Correspondence analysis (lecture 9)-this proceedure can use also more than two dimensions The main goal of loglinear analysis is to find dependencies in higher dimensional contingency tables -collection of more techniques there are more possibilities (e.g. in

SPSS three procedures for

loglinear analysis) SM 152

1)Loglinear 2)Model selection 3)Logit (not included in the lecture)

Loglinear models (Literature)

Many monographies: Agresti (2002), Wiley; Simonof (2003), Springer; Xie (2000) Knoke,Burke (1980), Sage; Ishii-Kuntz (1994), Sage In Czech Hebák a kol: (2005)Vícerozměrné stat. metody s aplikacemi, 3. díl, kapitola 1 SM 152

Loglinear Models

-try to use model which describe relation of two or more categorical (nominal or ordinal) variables -usually models for nominal varibles (sometimes only dichotomies), but models for ordinal data can be used (in SPSS only limited possibilities) -No distinction between dependent and independent variable (logit models use this distinction) SM 152

Contingency tables – some descriptive statistics

Frequencies Percentages –row, column or total?

Odds – new measure for contingency tables Odds ratio as one number for 2x2 contingency tables Higher order odds and odds ratios SM 152

Contingency tables – basics from bc. study

Dependence of two nominal/ordinal variables and adjusted residuals Null hypothesis: independence of variables Alternative hypothesis: dependence Logic of the chi-square test: Differencies between model of independence (hypothesis, expected frequencies) and real data (observed frequencies) Chi-square test of independence and the logic of the same test in loglinear models SPSS example including adjusted residuals SM 152

Excursus- work with SPSS syntax

Two possibilities for contingency tables: 1) Original data and we use two variables 2) We do not have original data but we have contingency tables, we use third variable as weight variable Example of second approach: data list free/sex edu count.

begin data 1 1 8 1 2 11 1 3 5 1 4 7 2 1 12 2 2 5 ……atd end data.

weight by count.

val lab sex 1 "male" 2„female" .

val lab edu 1 „basic eduaction" 2„secondary education" 3„tertiary education".

SM 152

Loglinear analysis

Basic statistical idea: To model frequencies in contingency tables Excursus: The logic of ANOVA Loglinear analysis is similar to ANOVA, but effects are not sumed but multiplied (see e.g. Field explanation of these similarities) It is possible to use effect of row variable, effect of column variable and also interaction effects (impact of combinations row&column variable together) Methodology: If we use more than two variables we make elaboration (we take impact of other variable see Babbie etc.) SM 152

Terminology Saturated model

-model with all variables and all possible interactions, this model explain fully observed frequencies but is not usefull (can not be tested) (observed freq.=expected)

Real model (non saturated)

-some interaction or variable is missing (do not explain fully observed frequencies but can be tested (expected frequencies from this model can be compared with observed frequencies in data (model estimates basic population contingency table, observed frequencies are from sample!!!)

residual

-differencies between observed frequencies and frequencies from model, can be statistically tested (we can identify problems in our models) SM 152

Loglinear model with only 2 variables Saturated model

-2 variables and their interaction

Model of independence (see also the chi-square test above)

only row and column variable,

no interaction residuals

-differencies between observed frequencies and expected frequencies from model of independence

Hierarchical models

–models which include all lower order interactions and all variables Abbreviations fo hierarchical models and its meaning (ABC) (AB)C etc.

SM 152

Odds and its usage in loglinear modeling Basic concept esp. for interpretation of results

(for loglinear models, logistic regression etc.)

Can be derived from parameters of loglinear (or logit) models For statistical reasons we use LOGIT = Log(ODDS) –

as we change (by logarithmic transformation) multiplicative models into additive we use logarithms insted of original variables

Range of odds, odds ratios and logits (differencies)

SM 152

Note at the end: Loglinear analysis is confirmatory, it enables to test dependencies, inclusion of variables (or their interactions) into the model, fit of the model etc. SM 152