Chapter 2: Data

Download Report

Transcript Chapter 2: Data

Chapter 2: Data
by Alyssa Webb
What are Data?
● Data can come in many different types but it useless
without it’s context.
● Not all data represented by numbers is numerical. (ex:
1=boy, 2=girl)
● Who, What, When, Where, Why, and How? provides
context for data.
● If you can’t answer Who and What, then you don’t
have data.
Who
●
●
●
●
Who are the cases for which we have collected data.
Respondents- people who answer a survey
Subjects/ Participants- people whom we experiment
Experimental Units- animals, plants, and inanimate
objects
What and Why
● The characteristics recorded about each
individual are called variables. They
should have a name that identifies What
has been measured.
● The Why of analysis will shape how we
view the variable.
Variable Types
● categorical- when a variable names
categories and answers how cases fall
into these categories
● quantitative-when a measured variable
with units answers questions about the
quantity
Where, When, and How
● Where and When give us information
about the context.
● How the data are collected can make the
difference between insight and
nonsense.
Identifying Identifiers
● Identifier variables- categorical variables
with exactly one variable in each
category. ex: Social Security number,
FedEx tracking number, ect.
Data Tables
● A data table is an arrangement of data in
which each row represents a case and each
column represents a variable.
● A case is an individual about whom or which
we have the data.
● A variable hold information about the same
characteristics for many cases.
Homework Problems
For each description of data, identify the
W’s name, the variables, specify for each
variable whether its use indicates it should
be treated as categorical or quantitative,
and, for any quantitative variable, identify
the units in which it was measured (or note
that they were not provided).
problem #23
In the Spring 2001 issue of Chance magazine, a
psychology professor reported on data he had collected
about his sleep patterns. He kept daily records of the
number of hours of sleep he got, whether or not he
suffered from “early awakening”, whether or not he
watched TV in the morning and in the evening, the
number of hours he spent standing during the day, and his
mood (happy/sad, on a scale from 10-90).
problem #23 answer
Who- Days
What- Sleep, wake early, TV, hours standing, mood
When- 2001
Where- At home
Why- To analyze sleep patterns
How- Daily recording
Variable- Sleep, quantitative, hours
Variable- Wake early, categorical
Variable- Tv, categorical
Variable- Hours standing, quantitative, hours
Variable- Mood, quantitative, scale 10-90
problem #25
The Kentucky Derby is a horse race that has been
run every year since 1875 at Churchill Downs, Louisville,
Kentucky. The race started as a 1.5 mile race, but in 1896
it was shortened to 1.25 miles because experts felt that
3-year -old horses shouldn’t run such a long race that
early in the season (it has been run in May every year but
one--1901--when it took place on April 29). Here are the
data for the first few recent races.
problem #25
problem #25 answer
Who- Kentucky Derby Races
What- Date, winner, margin, jockey, net proceed to winner, duration, track condition
When- 1875 to 2004
Where - Churchill Downs, Louisville, Kentucky
Why- To see horse race trends
How- Official statistics collected at the races
Variable- Year, quantitative, day and year
Variable- Winner, identifier
Variable- Margin, quantitative, horse lengths
Variable- Jockey, categorical
Variable- Net proceeds to winner, quantitative, dollars
Variable- Duration, quantitative, minutes and seconds
Variable- Track condition, categorical