Data and Social Research Chuck Humphrey Data Library Rutherford North Library Outline  Research data       Connect how social research uses quantitative evidence with data Discuss how statistics are.

Download Report

Transcript Data and Social Research Chuck Humphrey Data Library Rutherford North Library Outline  Research data       Connect how social research uses quantitative evidence with data Discuss how statistics are.

Data and Social Research
Chuck Humphrey
Data Library
Rutherford North Library
Outline

Research data






Connect how social research uses quantitative
evidence with data
Discuss how statistics are related to data
Statistics are about definitions and classifications
Being a critical user of statistics
Understanding the Census
Aggregate data and microdata
Uses of quantitative evidence

Providing a description of social phenomena


Making a comparison among social entities


This typically entails answering the question about
the scale or scope of some social group or
characteristics of the group.
This typically involves establishing the degree of
similarity or dissimilarity among social entities.
Identifying relationships among social
variables

This approach looks at the correlation among
social phenomena. How are things related?
How statistics and data differ
Statistics
•
•
•
•
•
numeric facts & figures
derived from data, i.e, already
processed
presentation-ready
needs definitions
published
Data
•
•
•
•
•
numeric files created and
organized for analysis or
processing
requires processing
not display-ready
needs detailed documentation
disseminated, not published
Statistics and data
Geography
Region
Time
Periods
Social Content
Smokers
The cells in the table are the numberEducation
of
Six dimensions
or variables in this table
estimated smokers.
Age
Sex
Statistics and data
Statistics are about definitions!

Tables are structured around geography, time and
social content based on attributes of the unit of
observation. These properties all need definitions.

Statistics are dependent
on definitions. You may
think of statistics as
numbers, but the numbers
represent measurements
or observations based on
specific definitions.
Statistics are about definitions!

Consider the following example from the 2006
Canadian Census on the data behind some statistics
about visible minorities.
Visible Minority Groups (15), Generation Status (4), Age Groups (9) and Sex (3) for the Population 15
Years and Over of Canada, Provinces, Territories, Census Metropolitan Areas and Census
Agglomerations, 2006 Census - 20% Sample Data
Statistics are about definitions!

How is visible minority status identified in the
Census? Are aboriginals among the visible
minority in Canada? What is the definition of
visible minority?
Statistics involve classifications
Classifications
Sex
Total
Male
Female
Periods
1994-1995
1996-1997
Statistics involve classifications
Some classifications are based
on standards while others are
based on convention or
practice.
For example, Standard
Geography classifications
Facts about statistics and data





Statistics are derived from observational,
experimental and computational data.
A table is a format for displaying statistics and
presents a summary or one view of the data.
Tables are structured around geography, time
and attributes of the unit of observation.
Statistics are dependent on definitions.
Working with data requires some computing
skills with analytic software.
Questions to ask about statistics
• Who published this statistic?



•
Can you name the producer or distributor of the data?
You need this information to provide a citation for each
statistic.
You should ask yourself what motive is behind this
published statistic.
What view of the data is shown in this statistic?



What level of geography is shown?
What time period is shown?
What social characteristics are shown?
Questions to ask about statistics
• What concepts are represented in this statistic?


Are definitions provided with the statistic for geography,
time or the social characteristics?
Was a standard classification system used for the
categories of the statistic?
• Can you identify a data source for the statistic?



If there isn’t a data source, the statistic isn’t real.
Is there enough information that you could find the data?
Can you name the data source itself?
Statistics are presentation ready

Tables and charts (or graphs) are typically
used to display many statistics at once.
You will find statistics sprinkled in text as
part of a narrative describing some
phenomenon; but tables and charts are the
primary methods of organizing and
presenting statistics.
Population and demographics


The Census is one of the most important sources
of statistical information about Canada. It is the
largest survey conducted in Canada and,
consequently, is the primary source for small
area statistics.
To use data from the Census, you must know:
 The characteristics collected in the Census
that are available for the spatial units used to
disseminate results;
 The variety of spatial units used to disseminate
Census results.
Census of Population




Two forms are used to collect the Census:
2A, which goes to 80% of the households,
and 2B, which goes to the other 20%.
In 2006, the 2A form contained 8 questions
while the 2B form had these 8 plus 53
additional questions.
Long history of specific questions (see the
Census Dictionary.)
You need to understand the content of the
Census to know what statistics are possible
from the Census.
CENSUS
2006
PostCensal
PALS
EDS
APS
STATS
Custom
Tabulations
STC
Website
PUMF
DATA
RDC
DLI
Aggregate
Public
Use
Microdata
Microdata and aggregate data
Microdata
•
•
from observational
methods
created from the
respondents in a
survey
Aggregate Data
•
•
•
statistics organized in a
data file structure
derived from microdata
sources
used in GIS & time
series analysis
Geo-code
Spatial Unit
Geo-referenced data
The unit analysis makes up the rows in
the data file and is the object being
described by the other variables the file.
The values for this variable are geo-codes
for Census tracts.
Geo-referenced data
This case in the data file represents
Census Tract 0023.00, which was shown in
the image two slides earlier.
The variety of spatial units

Statistics Canada groups the variety of spatial
units associated with the Census into two
groups:
Source: Illustrated Glossary, 2006 Census Geography, Statistics Canada
Source for the graphics: Illustrated Glossary, 2006 Census Geography, Statistics Canada
Urban small area statistics

Census Metropolitan Areas
Source for the graphic: Illustrated Glossary, 2006 Census Geography, Statistics Canada
Metropolitan Areas 2006
Map of Edmonton CMA
Census results for 2006

Standard Census data products






Highlight tables
Profiles
Census trends
Topic-based tabulations
For smaller areas outside CMAs or for
dissemination areas, need to retrieve from the
Data Library
Public use microdata files for individuals
CANSIM



CANSIM is a very large database containing
socio-economic statistics for Canada. There are
currently over 38 million time series organized in
approximately 2,800 tables.
The statistics in CANSIM come from surveys
(e.g., the Labour Force Survey), administrative
data (e.g., crime and justice) and simulations or
models (e.g., population projections).
Geography, content and time are basic to
retrieving time series from CANSIM.
Tools for working with data

Online copies of questionnaires and data
documentation from DLI.


http://www.statcan.gc.ca/dli-idd/dli-idd-eng.htm
Online catalogues, such as the Statistics
Canada DLI title list, the ICPSR catalogue, the
CESSDA portal and the ASSDA Nesstar server.




http://www.statcan.gc.ca/dli-idd/dli-idd-eng.htm
http://www.icpsr.umich.edu/
http://www.cessda.org/index.html
http://assda.anu.edu.au/
Tools for working with data

Online access to data through IDLS


Off line access through the Data Library


http://guides.library.ualberta.ca/data
Rutherford North, 1st Floor (492-5212)
Statistical software, such as SPSS

http://www.labs.ualberta.ca/