Stats 245.3 Introduction to Statistical Methods

Download Report

Transcript Stats 245.3 Introduction to Statistical Methods

Stats 245.3

Introduction to Statistical Methods

Instructor: Office: Phone: Lectures: Evaluation:

W.H.Laverty

235 McLean Hall 966-6096 M W F 11:30am - 12:20pm Thorv 271 Lab: W 3:30 - 4:20 Physics107 Assignments, Labs, Term tests - 40% Every 2nd Week (approx) – Term Test Final Examination - 60%

Dates for midterm tests: 1. Wednesday, Sept 17 (in the lab, 3:30pm) 2. Wednesday, October 01 (in the lab, 3:30pm) 3. Wednesday, October 15 (in the lab, 3:30pm) 4. Wednesday, October 29 (in the lab, 3:30pm) 5. Wednesday, November 19 (in the lab, 3:30pm) 6. Wednesday, December 03 (in the lab, 3:30pm)

Each test and the Final Exam are Open Book

Students are allowed to take in Notes, texts, formula sheets, calculators (No laptop computers.) The tests and the Final Exam are multiple choice and computer marked – Students need an HB pencil and to identify their paper with their student number.

Computer Assignments – due dates and time 1. TBA 2. TBA 3. TBA 4. TBA

Computer Assignments

It is important to learn to use at least one of the powerful statistical Packages – SPSS, Minitab, S-plus, SAS, R Very quickly statistical computations become outside the range of feasibility of simple computing devices (hand-held calculators, computer spreadsheets) These assignments are designed to give some initial experience with these packages.

Computer Assignments

will be accepted and given a mark if they are submitted after the due date and time, however assignments that are submitted late will

not

be returned.

Text

• The lectures will be given in Power Point • These will be posted on the Stats245 website • Tables that are required will be posted on the Stats 245 website • A text is not be required • I will post a list books in the library can be consulted

Alternative Texts (Available in Library) Title Author(s)

1.

Statistics Informed Decision using Data Sullivan 2.

3.

4.

5.

6.

Introductory Statistics Modern Elementary Statistics Elementary Statistics: A Brief version Elementary Statistics Statistics The Exploration and Analysis of Data 7.

8.

Statistics -A first course Statistics -A first course 9.

Basic Statistical Concepts 10.

An Introduction To Statistical Methods and Data Analysis 11.

Introductory Statistics Mann Freund Bluman Hoel Devore and Peck Freund Saunders, Smit, Adatia & Larson Bartz Ott Wonnacott & Wonnacott

To download lectures

1. Go to the stats 245 web site a) Through PAWS or b) by going to the website of the department of Mathematics and Statistics -> people -> faculty -> W.H. Laverty -> Stats 245-> Lectures.

2. Then a) select the lecture b) Right click and choose

Save as

To print lectures 1. Open the lecture using MS Powerpoint 2. Select the menu item

File -> Print

Stat 245.3

The following dialogue box appear

In the

Print what

box, select

handouts

Set

Slides per page

to

6

or

3

.

6 slides per page will result in the least amount of paper being printed 1 3 5 2 4 6

3 slides per page leaves room for

notes.

1 2 3

Course Outline

Introduction

• Populations, samples • Variables • Data Collection

Exploratory Statistics

Organizing and displaying Data Numerical measures of Central Tendency and Variability Describing Bivariate Data

Probability Theory

   Concepts of Probability Random variables and their distributions Binomial distribution, Normal distribution

Inferential Statistics

     Estimation, Hypotheses testing Comparing Samples Analyzing count data Regression and Correlation Non-parametric Statistics

End – Lecture 1

Introduction

The circular process of research:

Questions arise about a phenomenon Conclusion are drawn from the analysis A decision is made to collect data The data is summarized and analyzed The data is collected A decision is made as how to collect the data

What is Statistics?

It is the major mathematical tool of scientific inference (research) – with an interest in drawing conclusion from data. Data that is to some extent corrupted by some component of random variation (random noise)

Random variation or (random noise) can be defined to be the variation in the data that is not accounted for by factors considered in the analysis.

Example

Suppose we are collecting data on • Blood Pressure • Height • Weight • Age

Suppose we are interested in how • Blood Pressure is influenced by the following factors • Height • Weight • Age

Blood Pressure will not be perfectly predictable from : • Height • Weight • Age There will departures (random variation) from a perfect prediction because of other factors the could affect Blood pressure (diet, exercise, hereditary factors)

Another Example

In this example we are interested in the use of: 1. antidepressants, 2. mood stabilizing medication, 3. anxiety medication, 4. stimulants and 5. sleeping pills. The data were collected for

n

= 16383 cases

In addition we are interested in how the use these medications is affected by: 1. Age 20-29, 30-39,40-49, 50-59, 60-69, 70+ 2. Gender Male, female 3. Education – < Secondary, – – Secondary Grad., some Post-Sec., – Post-Sec. Grad.

4. Income – Low, Low Mid, Up Mid, High 5. Role – parent, partner , worker – – parent, partner parent, worker – – partner, worker worker only – – parent only partner only – no roles

Some questions of interest

1. How are the dependent variables (antidepressant use, mood stabilizing medication use, anxiety medication use, stimulants use, sleeping pill use) interrelated?

2. How are the dependent variables (drug use) related to the independent variables (age, gender, income, education and role)?

• Again the relationships will not be perfect • Because of the effects of other factors (variables) that have not been considered in the experiment • If the data is recollected, the patterns observed at the second collection will not be exactly the same as that observed at the first collection

The data appears in the following Excel file Drug data

In Statistics

• Questions – About some scientific, sociological, medical or economic phenomena • Data – The purpose of the data is to find answers to the questions • Answers – Because of the random variation in the data (the noise). Conclusions based on the data will be subject to error.

The circular process of research:

In what part of this process does

statistics

play a role?

Questions arise about a phenomenon Conclusion are drawn from the analysis A decision is made to collect data The data is summarized and analyzed

Statistics

The data is collected

Statistics

A decision is made as how to collect the data

Experimental Design

Statistical Theory is interested in 1. The design of the data collection procedures. (Experimental designs, Survey designs). The experiment can be totally lost if it is not designed correctly.

2. The techniques for analyzing the data.

In any statistical analysis it is important to assess the magnitude of the error made by the conclusions of the analysis.

Consider the following statement:

You can prove anything with Statistics.

In fact:

One is unable to “prove” anything with Statistics.

At the end of any statistical analysis there always is a possibility of an error in any of the decisions that it makes.

The

success

of a research project

does not depend

on the its conclusions The

success

of a research project

depends

on the accuracy of its conclusions

If one is testing the effectiveness of a drug There is two possible conclusions: 1. The drug is effective: 2. The drug is not effective:

The

success

of a this project

does not depend

on the its conclusions The

success depends

on the accuracy of its conclusions

For this reason: It is extremely important in any study to assess the accuracy of its conclusions

Some definitions

important to Statistics

A population:

this is the complete collection of subjects (objects) that are of interest in the study. There may be (and frequently are) more than one in which case a major objective is that of comparison.

A case (elementary sampling unit):

This is an individual unit (subject) of the population.

A variable:

a measurement or type of measurement that is made on each individual case in the population.

Types of variables

Some variables may be measured on a

numerical scale

while others are measured on a

categorical scale

. The nature of the variables has a great influence on which analysis will be used. .

For Variables measured on a

numerical scale

the measurements will be numbers.

Ex: Age, Weight, Systolic Blood Pressure For Variables measured on a

categorical scale

the measurements will be categories.

Ex: Sex, Religion, Heart Disease

Note

Sometimes variables can be measured on both a

numerical scale

and a

categorical scale

. In fact, variables measured on a

numerical scale

can always be converted to measurements on a

categorical scale

.

Example

The following variables were evaluated for a study of individuals receiving head injuries in Saskatchewan.

1. Cause of the injury (categorical) • Motor vehicle accident • • Fall Violence • other

2. Time of year (date) (numerical or categorical) • • • • summer fall winter spring 3. Sex on injured individual (categorical) • male • female

4. Age (numerical or categorical) • < 10 • • 10-19 20 - 29 • • 30 - 49 50 – 65 • 65+ 5. Mortality (categorical) • Died from injury • alive

Types of variables

In addition some variables are labeled as

dependent

variables and some variables are labeled as

independent

variables.

This usually depends on the

objectives

of the analysis.

Dependent

variables are

output

or

response

variables while the

independent

variables are the

input

variables or

factors

.

Usually one is interested in determining equations that describe how the dependent variables are affected by the independent variables

Example

Suppose we are collecting data on • Blood Pressure • Height • Weight • Age

Suppose we are interested in how • Blood Pressure is influenced by the following factors • Height • Weight • Age

Then • Blood Pressure is the

dependent

variable and • Height • Weight • Age Are the

independent

variables

Example – Head Injury study

Suppose we are interested in how • Mortality is influenced by the following factors • Cause of head injury • Time of year • Sex • Age

Then • Mortality is the

dependent

variable and • Cause of head injury • Time of year • Sex • Age Are the

independent

variables

dependent independent Response variable predictor variable

A population:

this is the complete collection of subjects (objects) that are of interest in the study. There may be (and frequently are) more than one in which case a major objective is that of comparison.

A case (elementary sampling unit):

This is an individual unit (subject) of the population.

A variable:

a measurement or type of measurement that is made on each individual case in the population.

Variables may be measured on a

numerical scale

while others are measured on a

categorical scale

. Variables may be labeled as

dependent

variables and some variables are labeled as

independent

variables.

Dependent

variables are

output

or

response

variables while the

independent

variables are the

input

variables or

factors

.

Independent Dependent

A sample:

Is a subset of the population

In statistics:

One draws conclusions about the population based on data collected from a sample

Reasons: Cost

It is less costly to collect data from a sample then the entire population

Accuracy

Accuracy

Data from a sample sometimes leads to more accurate conclusions then data from the entire population Costs saved from using a sample can be directed to obtaining more accurate observations on each case in the population

Types of Samples

different types of samples are determined by how the sample is selected.

Convenience Samples

In a convenience sample the subjects that are most convenient to the researcher are selected as objects in the sample. This is not a very good procedure for inferential Statistical Analysis but is useful for exploratory preliminary work.

Quota samples

In quota samples subjects are chosen conveniently until quotas are met for different subgroups of the population. This also is useful for exploratory preliminary work.

Random Samples

Random samples of a given size are selected in such that all possible samples of that size have the same probability of being selected.

Convenience Samples and Quota samples are useful for preliminary studies. It is however difficult to assess the accuracy of estimates based on this type of sampling scheme. Sometimes however one has to be satisfied with a convenience sample and assume that it is equivalent to a random sampling procedure

Population × Case Sample Variables X Y Z

Some other definitions

A population statistic (parameter):

Any quantity computed from the values of variables for the entire population.

A sample statistic:

Any quantity computed from the values of variables for the cases in the sample.

Since only cases from the sample are observed – only sample statistics are computed – These are used to make inferences about population statistics – It is important to be able to assess the accuracy of these inferences

Organizing Data

the next topic