Quantitative Evidence Sociology 519 Chuck Humphrey Data Library University of Alberta Outline  Quantitative evidence  Distinction between statistics and data  Observational evidence  Statistics are about.

Download Report

Transcript Quantitative Evidence Sociology 519 Chuck Humphrey Data Library University of Alberta Outline  Quantitative evidence  Distinction between statistics and data  Observational evidence  Statistics are about.

Quantitative Evidence
Sociology 519
Chuck Humphrey
Data Library
University of Alberta
Outline
 Quantitative evidence
 Distinction between statistics and data
 Observational evidence
 Statistics are about definitions and
classifications
 Aggregate data and microdata
 Understanding the Census
 Access to evidence
 Statistical and aggregate data sources
 Microdata sources
Statistics and Data
Statistics
•
•
•
•
•
numeric facts & figures
derived from data, i.e, already
processed
presentation-ready
need definitions
published
Data
•
•
•
•
•
numeric files created and
organized for analysis/
processing
requires processing
not display-ready
need detailed documentation
disseminated, not published
Statistics and Data
Geography
Region
Time
Periods
Social Content
Smokers
Education
Age
The cells in the table are the number of
estimated smokers.
Six dimensions
or variables in this table
Sex
WHERE ARE THE DATA!
Statistics and Data
Stories are told through statistics
The National Population Health Survey in the
previous example had over 80,000
respondents in 1996-97 sample and the
Canadian Community Health Survey in 2005
has over 130,000 cases. How do we tell the
stories about each of these respondents?
 We use statistics to create summaries of these
life experiences.
 Data enable us to construct the tables or
analyses to tell these summarized stories.

Statistics are about definitions!

Tables are structured around geography, time and social
content based on attributes of the unit of observation.
These properties all need definitions.

Statistics are dependent on
definitions. You may think of
statistics as numbers, but the
numbers represent
measurements or
observations based on
specific definitions.
Statistics involve classifications!
Classifications
Sex
Total
Male
Female
Periods
1994-1995
1996-1997
Statistics involve classifications!
Some classifications are based
on standards while others are
based on convention or
practice.
For example, Standard
Geography classifications
What about data?


It is helpful to understand some basics about the
origins of data, especially since statistics are derived
from data. As we will see later, having a good
understanding of data can greatly help in the search
for statistics.
There are three generic methods by which data are
produced. Statistics are generated from the data
produced out of all of these methods.
Methods for producing data
Observational
Methods
Experimental
Methods
Computational
Methods
Focus is on
developing
observational
instruments to collect
data
Focus is on
manipulating causal
agents to measure
change in a response
agent
Focus is on modeling
phenomena through
mathematical
equations
Correlation
Causation
Prediction
Replicate the analysis
(same data or similar)
Replicate the
experiment
Replicate the
simulation
Statistics summarize
observations
Statistics summarize
experiment results
Statistics summarize
simulation results
Facts about statistics and data
Statistics are derived from observational,
experimental and simulated data .
 A table is a format for displaying statistics and
presents a summary or one view of the data.
 Tables are structured around geography, time
and attributes of the unit of observation.
 Statistics are dependent on definitions.
 Working with data requires some computing
skills with analytic software.

Questions to ask about statistics
• Who published this statistic?



•
Can you name the producer or distributor of the data?
You need this information to provide a citation for each
statistic.
You should ask yourself what motive is behind this published
statistic.
What view of the data is shown in this statistic?



What level of geography is shown?
What time period is shown?
What social characteristics are shown?
Questions to ask about statistics
• What concepts are represented in this statistic?


Are definitions provided with the statistic for geography, time
or the social characteristics?
Was a standard classification system used for the categories
of the statistic?
• Can you identify a data source for the statistic?



If there isn’t a data source, the statistic isn’t real.
Is there enough information that you could find the data?
Can you name the data source itself?
The Canadian Census
 The
Census is the largest survey collected
in Canada and is taken every five years.
 The last two censuses were in 2001 and
2006. The censuses in years ending in 1 are
known as the decennial census and contain
certain questions only asked every ten years
(e.g., religion.)
Census of Population
Two forms are used to collect the Census: 2A,
which goes to 80% of the households, and 2B,
which goes to the other 20%.
 In 2006, the 2A form contained 8 questions
while the 2B form had these 8 and 53 additional
questions.
 Long history of specific questions (see the
Census Dictionary.)
 Need to understand the content of the Census
to know what statistics are possible from the
Census.

Census Definitions
The Census Dictionary is also important
to understand the current definitions for
concepts as well as historical definitions.
 Here is an example on aboriginal identity:

“The Aboriginal identity question was asked for the first time
in the 1996 Census. It asked the respondent if he/she was an
Aboriginal person, i.e., North American Indian, Métis or Inuit.
The question is used to provide counts of persons who
identify themselves as Aboriginal persons. The concept of
'Aboriginal identity' was first used in the 1991 Aboriginal
Peoples Survey.”
CENSUS
2006
PostCensal
PUMF
PALS
EDS
APS
STATS
Custom
Tabulations
STC
Website
DATA
RDC
E-STAT
DLI
Aggregate
Public
Use
Microdata
Geo-code
Geographic
Unit
Geo-referenced data
The unit analysis makes up the rows in
the data file and is the object being
described by the other variables the file.
The values for this variable are geo-codes
for Census tracts.
Geo-referenced data
This case in the data file represents
Census Tract 0023.00, which was shown in
the image two slides earlier.
The variety of geographic units

Statistics Canada groups the variety of
geographic units associated with the Census
into two categories:
Source: Illustrated Glossary, 2006 Census Geography, Statistics Canada
Source for the graphics: Illustrated Glossary, 2006 Census Geography, Statistics Canada
Census geo-codes

Statistics Canada has two categories of
geo-code systems:


Standard Geographic Classification (SGC)
Other geographic entities
Source for the graphic: Illustrated Glossary, 2006 Census Geography, Statistics Canada
Standard geographic classification
Source: Illustrated Glossary, 2006 Census Geography, Statistics Canada
Standard geographic classification, 2006
The link to
Definitions, data
sources and
methods on the
main page of the
Statistics Canada
website provides a
link to Standard
Classifications,
which includes
Geography.
Other geographic entities

Census Metropolitan Areas
Source for the graphic: Illustrated Glossary, 2006 Census Geography, Statistics Canada
Metropolitan Areas 2006
Map of Edmonton CMA
Online sources for statistics
For characteristics about Canadians, you
need to become familiar with Statistics
Canada’s website.
 This is a complex website. Use the “Popular
picks” list on the home page and search for
statistics by browsing subject terms.
 Historical Statistics

E-STAT
E-STAT is a portal to free CANSIM time series
statistics and Census results from 1981 to
2006.
 CANSIM on Statistics Canada’s website
charges $3.00 a time series, while these
statistics accessed through CANSIM on ESTAT are free.

Online guide to published stats

The Library homepage has useful guides for
locating statistics online and in print
Microdata & aggregate data
Microdata
•
•
from observational
methods
created from the
respondents in a survey
Aggregate Data
•
•
•
statistics organized in a
data file structure
derived from microdata
sources
used in GIS & time series
analysis