This space is for the name and role of the speaker. A sans

Download Report

Transcript This space is for the name and role of the speaker. A sans

Andrew Smith
Describing childhood diet with
cluster analysis
Young Statisticians’ meeting. 12th April 2011
2
Describing diet with cluster analysis
• Pauline M. Emmett
• P. Kirstin Newby
• Kate Northstone
• World Cancer Research Fund
• MRC, Wellcome Trust, University of Bristol
3
Outline
• Introductions
•
•
•
•
ALSPAC
Food frequency questionnaires
Dietary patterns
Cluster analysis
• k-means cluster analysis
• Results
• 3 cluster solution
• Associations with socio-demographic variables
4
ALSPAC
• Avon Longitudinal Study of
Parents and Children
• Birth cohort study
• 14,541 pregnant women and their
children
• www.bris.ac.uk/alspac
5
Food frequency questionnaires
6
Dietary patterns
• Examine diet as a whole
• Analyse multivariate FFQ data
• Use correlations between foods
• PCA
• Cluster analysis
Image: Paul / FreeDigitalPhotos.net
7
Cluster analysis
• Separate subjects into
non-overlapping
groups
• Based on ‘distances’
between individuals
• Unsupervised learning
Image: Boaz Yiftach / FreeDigitalPhotos.net
8
k-means cluster analysis
• Most widely used for dietary patterns
• Number of clusters, k, is specified beforehand
• Minimises
– Distance from each subject to his/her cluster
mean
– Summed over all subjects in that cluster
– Summed over all clusters
9
k-means cluster analysis
10
Problems with the standard algorithm
• Short-sighted
• Tends to find solutions that are at a local minimum
– So run algorithm 100 times and choose solution
that is minimum out of all minima
11
Standardising the input variables
12
Reliability of the cluster solution
• Split sample in half
• Perform separate analyses on each half
• See how many children change clusters
• Repeat 5 times
– 32 out of 8,279 children changed cluster (0.4%)
13
4177 children
Processed
Image: Suat Eman, Rawich, Master Isolated Images / FreeDigitalPhotos.net
14
2065 children
Plant-based
Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net
15
2037 children
Traditional
British
Image: Suat Eman, Filomena Scalise, Maggie Smith / FreeDigitalPhotos.net
16
Associations with socio-demographic vars
Processed
Plant-based Traditional
British
Traditional
Plant-based British
Processed
Girls
3,115 1
1
1
Boys
2,941 0.82
(0.72, 0.93)
1.03
(0.89, 1.20)
1.18
(1.04, 1.34)
17
Associations with socio-demographic vars
Processed
Maternal
age
Plant-based Traditional
British
Traditional
Plant-based British
Processed
< 21
130 1
1
1
21-25
994 0.59
(0.33, 1.07)
1.07
(0.56, 2.05)
1.57
(1.02, 2.43)
26-30
2,644 0.52
(0.29, 0.92)
1.20
(0.64, 2.28)
1.60
(1.04, 2.46)
31+
2,288 0.37
(0.21, 0.67)
1.50
(0.79, 2.88)
1.77
(1.13, 2.76)
18
Associations with socio-demographic vars
Processed
Maternal
education
Plant-based Traditional
British
Traditional
Plant-based British
Processed
CSE
740 1
1
1
Vocational
504 0.84
(0.60, 1.17)
1.19
(0.82, 1.72)
1.01
(0.76, 1.32)
O level
2,163 0.65
(0.51, 0.83)
1.46
(1.10, 1.94)
1.05
(0.86, 1.30)
A level
1,604 0.42
(0.33, 0.55)
2.01
(1.50, 2.69)
1.18
(0.95, 1.48)
Degree
1,045 0.30
(0.23, 0.39)
2.75
(2.00, 3.76)
1.22
(0.94, 1.57)
19
Associations with socio-demographic vars
Processed
Siblings
Plant-based Traditional
British
Traditional
Plant-based British
Processed
0 older
2,755 1
1
1
1 older
2,317 1.21
(1.03, 1.42)
1.12
(0.94, 1.36)
0.73
(0.62, 0.86)
984 1.58
(1.28, 1.97)
0.99
(0.76, 1.27)
0.64
(0.52, 0.80)
2+ older
20
Associations with socio-demographic vars
Processed
Siblings
Plant-based Traditional
British
Traditional
Plant-based British
Processed
0 younger
2,946 1
1
1
1 younger
2,490 1.01
(0.86, 1.19)
0.58
(0.48, 0.71)
1.69
(1.44, 1.99)
620 1.21
(0.92, 1.57)
0.43
(0.33, 0.58)
1.90
(2.50, 2.40)
2+ younger
21
Summary
• Multivariate methods to compress FFQ data into
dietary patterns
• k-means cluster analysis is widespread but must
be applied carefully
• Processed, Plant-based and Traditional British
clusters in 7-year-old children
• Associated with various socio-demographic
variables