Preliminary Concepts

Download Report

Transcript Preliminary Concepts

Preliminary Concepts
Why Do We Need Statistics?
 Statistics is about making decisions
 Consider the examples below
 You would like to know whether your English skills are good
enough to take psychology courses in English. Should you focus on
English practices, and take more courses on reading and writing?
 The simplest way is to take a test and see your score.
 Let’s say it is 110. Is it a good score?
 You need to buy a simple calculator. How much will you pay for it?
Is there any shop in your neighborhood that you can buy the same
calculator at a lower price?
 Let’s say there are sixteen shops in your town. How will you decide the
cheapest one?
Why Do We Need Statistics?
 Which is a better teaching technique: giving out the course
notes and presentations or requiring students to take notes
during class?
 You would like to know which goalkeeper performed better last
season.
 You need to count each goalkeepers’ number of times that s/he stopped
goals, saved penalty kicks and number of the games that s/he played.
 You hate ÖSYM and university exam.You believe that LYS has
nothing to say about a student’s future success in the university.
That is, students’ GPA (Grade Point Average) cannot be
predicted from their LYS score. How can you prove it?
Why Do We Need Statistics?
 Even in everyday life, we need to decide in ambiguous
conditions or under the conditions in which there is a huge
amount of information. In such conditions, we need an
effective tool to organize the information that we have.
 Statistics provides such a mathematical tool by which we can
summarize the existent information and/or make predictions
or inferences.
Why Do We Need Statistics?
 Statistics is the science of learning from data, and of
measuring, controlling, and communicating uncertainty; and
it thereby provides the navigation essential for controlling the
course of scientific and societal advances (Davidian, M. and
Louis, T. A., 10.1126/science.1218685).
 Statisticians apply statistical thinking and methods to a wide
variety of scientific, social, and business endeavors in such
areas as astronomy, biology, education, economics,
engineering, genetics, marketing, medicine, psychology,
public health, sports, among many. "The best thing about
being a statistician is that you get to play in everyone else's
backyard." (John Tukey, Bell Labs, Princeton University)
Basic Concepts
Descriptive and Inferential
 Two kinds of statistics could be differentiated
 Descriptive statistics (deduction)
 is the discipline of quantitatively describing the main features of
a collection of data
 Suppose that you visited each shop in your town and checked
the prices of the calculators.
Basic Concepts
Descriptive Statistics
 Let’s say you decided
to buy Casio. Now we
need to rearrange our
table to see the lowest
price for Casio
Casio ZX (TL) Sharp Q (TL)
Yumatu 1.0
(TL)
Teknosa
5
6
2
Mediamarkt
6
4
3
Migros
6
5
3
BİM
4
6
1
KİPA
5
4
3
Selim Kırtasiye
7
5
3
Cafer’in Yeri
6
5
3
Elektro world
5
5
2
Tchibo
7
5
3
Ayfer Kırtasiye
6
4
4
Cümbüş Kırtasiye
6
5
3
Koçtaş
7
5
2
Abdurrahman Abinin Tezgahı
4
5
3
Bizim sokaktaki tezgah
3
5
2
Sizin sokaktaki tezgah
4
4
3
Basic Concepts
Descriptive Statistics
 As you can see, Bizim Sokaktaki tezgah
offers the best price.
 Looking closely to the table, we can
see other characteristics of the
distribution.
 The most common price for Casio is
6 TL. Thirty three percent of the
shops sell Casio for 6 TL
(5/15*100=33.33). The highest
price for Casio is 7 TL and twenty
percent of the shops offers that price
(3/15*100=20.00)
 Based on the present table, several
deductions could be made. For
instance,
 which shop offers the best price for
Yumatu or
 in which shop, we can see the biggest
difference between the prices of
Sharp and Yumatu.
Casio ZX (TL) Sharp Q (TL)
Yumatu 1.0
(TL)
Selim Kırtasiye
7
5
3
Tchibo
7
5
3
Koçtaş
7
5
2
Mediamarkt
6
4
3
Migros
6
5
3
Cafer’in Yeri
6
5
3
Ayfer Kırtasiye
6
4
4
Cümbüş Kırtasiye
6
5
3
Teknosa
5
6
2
KİPA
5
4
3
Elektro world
5
5
2
BİM
4
6
1
Abdurrahman Abinin Tezgahı
4
5
3
Sizin sokaktaki tezgah
4
4
3
Bizim sokaktaki tezgah
3
5
2
Basic Concepts
Inferential Statistics
 Inferential Statistics (induction)
 is aimed to make predictions based on the analysis of numeric
data.
 Inferential statistics is about the probability.
 By the aid of the inferential statistics, we can see whether our
predictions are better than chance.
Basic Concepts
Inferential Statistics
 Let’s turn back to our example about your English Skills.
When you get a certain score from a test (110 points for our
example), at least three questions arises:
 Q1: Is this your true score?
 Q2: What is the meaning of your score?
 Q3: Can we take the score in this test as a predictor of
prospective (future) success in Psychology courses?
Basic Concepts
Inferential Statistics
 Q1: Is this your true score?
 Were you tired when you took the test?
 Or, did the test cover the subjects that you are very familiar.
 Or were you simply lucky (lucky guess is an inevitable part of
multiple choice tests).
Basic Concepts
Inferential Statistics
 One way to see whether your score was
affected by chance or other factors is to
complete an identical test or the same
test. Of course, you would learn the
items if you took the same test. Let’s say
you find identical tests and completed
ten of them.
Test
Number Score
1
110
2
96
3
98
4
96
5
112
6
98
7
106
8
106
9
96
10
98
11
89
Mean
100,45
Basic Concepts
Inferential Statistics
 So, which of them is your true
score?
 Should we accept the mean as your
true score?
 But, you should note that you never
got 100.45 and it seems not
possible to take such a score.
Test
Number Score
1
110
2
96
3
98
4
96
5
112
6
98
7
106
8
106
9
96
10
98
11
89
Mean
100,45
So, what we need to do first is to find out (predict) your true score.
Basic Concepts
Inferential Statistics
 Q2: What is the meaning of your score?
 What is the rage of scores, which could be taken from the test?
 Let’s say the possible range for the scores is between 25 and
150. Is it enough to say your score is OK?
 What you need to decide is a reference point. If you find a
way to compare your score with a special score, you can
decide whether your English is good or bad.
 There could be two kinds of reference points
Basic Concepts
Inferential Statistics
 You can ask your classmates to complete the same test and
you simply evaluate your rank among their scores.
 Let’s say, you are better than sixty percent of the classmates
in that test.
 Shall we take that as an evidence of your superiority in
English? Let’s examine the table
Basic Concepts
Inferential Statistics
 As you can see, %30 of your classmates
got the highest scores.
 The difference between the higher
score and your score is 36 point.
 The difference between the lower
score and your score is 1 point.
1 Ayda
2 Funda
3 Melda
4 You
5 Selda
6 Selma
7 Şeyma
8 Ceyda
9 Arda
10 Hülya
Cumulative
Test Score Percent
148
100
146
90
146
90
110
70
109
60
109
60
108
40
108
40
108
40
107
10
Do you still think you proficiency is better than most of your classmates?
Basic Concepts
Inferential Statistics
 As a second way, you can compare your score with a national
cut point(s).
 For instance, test developers might publish a chart to interpret
the scores: 20-70 beginner, 71-90 intermediate, 91-110 upperintermediate, 111-150 advanced
 Your original score was 110. That score is the upper limit for
upper-intermediate. That is your score is at the edge of the
border between upper-intermediate and advanced. According
to the manual, you should be categorized as upperintermediate. Do you agree with that?
So, the second thing that we infer is whether your score significantly
differs from a meaningful reference point.
Basic Concepts
Inferential Statistics
 3) Can we take the score in this test as a predictor of
prospective (future) success in Psychology courses? On
which bases?
 Let’s say some famous Psychologist took the same test just
beginning of the first semester in Çağ University
Basic Concepts
Inferential Statistics
 As it could be seen at the table, there is a
relation between the test scores and
GPA.
 As the proficiency scores increase, GPA
increases.
 This pattern is called positive correlation.
 In the case of negative correlation, one
score decreases as the other score
increases.
 If the correlation (relation) between
proficiency and GPA is strong enough,
then we can infer your future success.
English
Proficiency
Erikson
Skinner
Sigmund Freud
Pavlov
Reich
GPA 1st Year
145
85
135
82
95
78
90
77
90
76
Basic Concepts
Inferential Statistics
 Considering the table, we can see that
your proficiency score is between
Skinner and Freud. So, your GPA will
be most probably between 78 and 82.
 Congratulations, you have the
potential to become a better
psychologist than Freud 
English
Proficiency
Erikson
Skinner
Sigmund Freud
Pavlov
Reich
GPA 1st Year
145
85
135
82
95
78
90
77
90
76
So, the third thing that we need to infer is whether we can
predict your GPA from your proficiency score.
Basic Concepts
Inferential Statistics
 In sum, Descriptive Statistics is about describing certain
characteristics of the sample or population. However,
Inferential Statistics is about predicting certain characteristics
of the population by evaluating the characteristics of the
sample.
 At this point, we need to define what is sample and
population
Basic Concepts
Population and Sample
 A population can be defined as
including all people or items
with the characteristic one
wishes to understand.
 Let’s say, you believe that blonde
girls are not that clever.
 In this example, all blonde girls
in the world are your population.
 The characteristic that you are
interested in is their level of
intelligent.
Basic Concepts
Population and Sample
 To clarify your hypothesis, you
need to limit your population.
 So, are you also interested in the
girls changed their hair color
into blonde? Probably not.
 Then, you should restate your
argument: Inherently blonde
girls are not that clever.
 Once you define your
population properly, you can
start collecting data on the
characteristic that you are
interested.
Basic Concepts
Population and Sample
 Sample is a subset of the population, which we can reach and
collect data.
 Let’s say you are realy eager to conduct a study on the level
of intelligence of inherently blonde girls.
 Since, it is not possible to reach each blonde girl in the world,
you need to find a subgroup and give them your IQ test.
Basic Concepts
Population and Sample
 Sampling is a vital issue for statistics and research methods.
 The main purpose of sampling is to reach the most
representative subset of the population.
 If your sample is not representative, your findings will not be
valid.
Basic Concepts
Population and Sample
 In the 1936 American presidential election Roosevelt, a
Democrat, was being challenged by Republican Alf Landon.
One of the leading magazines of the day , Literary Digest,
surveyed voter preferences by mailing questionnaires to 10
million people whose names were gathered from list of
automobile and telephone owners. Over the two million
people responded and the results indicated that Landon
would beat Roosevelt by a landslide.
Basic Concepts
Population and Sample
 In fact, Roosevelt beat Landon by one of the largest margins
ever.
 This was one of the largest surveys ever taken. How could it
have been so wrong?
 The US was in the middle of Great Depression in 1936 and
only a minority of people was financially secure enough to
own a car or telephone. They tended to vote Republican.
Most other Americans were worried about buying enough
food to feed their families, and they tend to vote Democratic.
Basic Concepts
Population and Sample
 To ensure representativeness, inferential statistics require
random sampling.
 By random sampling, we ensure that each possible sample of
the same size has an equal probability of being selected from
the population.
 For instance, suppose that we wish to select five person five
persons random from our current statistics class.
 What we need to do is to
 write the name of each class member on a slip of paper,
 put those slips in a gallon jar,
 shake and tumble the contents of the jar well, and
 withdraw five slips from the lot.
Basic Concepts
Variables and Constants
 A variable is a characteristic that could take on different
values.
 Considering our hypothesis about blondes, you can see that the
variable that we are interested in is the level of intelligence.
 When we measure blonde girls’ intelligence, we can see that
their scores are not identical.
 In fact, statistics is about variability. By the aid of the
statistical techniques, we try to organize and understand the
variability in nature.
Basic Concepts
Variables and Constants
 A constant is a characteristic which is identical for the each
member or the sample.
 For instance, hair color and gender would be constant for our
hypothetical study on the level of intelligence of blonde girls.
 Additionally, constants delimit applicability of our findings.
 Even if we observe an intelligence deficiency in blonde girls, it
doesn’t say anything about red-heads or blonde boys.
Scales of Measurement
 Measurement is the process of assigning numbers to
observations. Let’s discuss about how we measure the
properties below
 Weight of a box: a weighing machine
 Length of a table: a ruler
 Beauty of a competitor in a beauty contest: (?)
 Gender of a participant: (?)
 Success of a football team in the league: (?)
Scales of Measurement
 What about the meaning of the numbers that we assign. Are they
same?
 If the weighing machine show zero, can we take that number as an




indicator of no weight at all?
What about the judge in the beauty contest? If he assign zero to a
competitor, does it mean she has no beauty?
Let’s say Galatasaray won 30 games last year, and Fiskobirlik won 15
games. Can we say Galatasaray won twice as many games as
Fiskobirlik?
Let’s say the rank of Galatasaray is 2 and of Fiskobirlik is 12. Does that
mean Galatasaray’s rank of success is 6 times higher than Fiskobirlik’s?
What about the beauty contest? If Aylin wins the contest and Jale gets
the third, does it mean Aylin is three times more beautiful than Jale?
Scales of Measurement
 Apparently, numbers have different meanings in these
situations.
 To distinguish the different kinds of situations, we need to
identify four kinds of measures.
 Nominal Scales
 Ordinal Scales
 Interval Scales
 Ratio Scales
Scales of Measurement
Nominal Scales
 Nominal scales are the simplest kinds of scales.
 Some variables are qualitative in their nature rather than
quantitative.
 For instance, biological sex, types of cheese, brand names of the
cell phones, etc.
 Numbers in nominal scales has no meaning rather than
indication of differing categories.
 If we assign 1 to males and 2 to females, there is no implication
that females “more than” male in some dimension.
Scales of Measurement
Nominal Scales
 Nominal Scales has only two reguirements:
 The categories have to be mutually exclusive: the observations
can not fall into more than one category
 The categories have to be exhaustive: there must be enough
categories for all observations
 Examples
 Male and Female are mutually exclusive and exhaustive
categories for biological sex.
 What about Gender (social sex). Some individuals in biological
female category might feel much more like they are male. So,
we need to include other categories like Gay, Lesbian,
transsexual etc.
Scales of Measurement
Ordinal Scales
 A more complex scale than nominal ones
 The categories must still be mutually exclusive and exhaustive
 They are also indicate the order of magnitude of some variable
 The outcome of ordinal scales is a set of ranks
 Socio-economic Status: Low-Middle-High
 College students: Freshman, Sophomores, Juniors, and Seniors
 Numbers can be assigned to the categories, but that numbers
has no meaning than the rank of numbers.
 Let’s consider our example of SES
 Is the difference between Low and Middle equal to the
difference between Middle and High?
Scales of Measurement
Interval Scales
 The next major level of complexity is the interval scales
 Interval Scales have all the properties that ordinal scales have.
Additionally,
 The interval (distances) between scores has the same meaning
anywhere on the scale.
 Examples:
 Level of depression on Beck Depression Scale
 Pain temperature scales
 Celsius and Fahrenheit scales
Scales of Measurement
Interval Scales
 Let’s discuss about Celsius scale
 The difference between 10C and 20C is equal to 20C and 30C.
 That is, energy you need to increase heat of a certain amount of
water from 10 to 20 is equal to the amount of energy for an
increase from 20 to 30.
 What about 0C? Does it mean there is no heat?
Scales of Measurement
Ratio Scales
 The most complex and advanced scales
 Ratio scales posses all the properties of interval scales and in
addition has a absolute zero point
 Gram for weight and centimeter for height are some
examples.
 If something is zero grams, then it has no mass.
 If something is zero centimeters, then it has no length.
 Kelvin is a good example.
 Differing from Celsius and Fahrenheit, Kelvin has an absolute
zero point.
 That is, at zero Kelvin substance would have no molecular
motion (energy) and, therefore, no heat
Why does absolute zero point matter?
 Imagine we want to measure the temperature of our
classroom with a Celsius scale.
 Let’s say it is 30C. One of our friends would say it was 15C
last winter. So, does it mean it is now twice hotter than last
winter?
Why does absolute zero point matter?
 No it doesn’t
 Since, the zero point is not absolute in Celsius scale; we can
move it up or down.
 Let’s say we decided to move it 10C lower.
 Thus, our new Celsius Scale would show 40C for the current
temperature, and 25 for last winter.
 So, it is not meaningful to assert that a temperature of 30C is
twice hot as one of 15 or that a rise from 30C to 33C is a
10% increase.
Final notes about scales
 The ratio scale subsumes all other scales
 Ratio>Interval>Ordinal>Nominal
 Computation with the scores
 Nominal scales: Clustering
 Ordinal Scales: Clustering and rank order
 Interval Scales: addition and subtraction
 Ratio Scales: addition, subtraction, multiplication and division
Variables and Computational Accuracy
 Variables may be either discrete (kesikli) or continuous
(sürekli).
 Discrete Variable
 The variables which can take on only certain values
 For instance, number of the students in our classroom is discrete. It is 43
this week, but it was 42 last week. But no value can be between these
two.
 Continuous variables can take on any value.
 For instance, temperature can be 29C, 29.4C or 30C
Variables and Computational Accuracy
 Even though a variable continuous in theory, the process of
measurement always reduce it to a discrete one.
 Imagine, the true weight of a tomato were 0.23138 kilogram.
 A standard weighing machine is not that sensitive.
 It would measure weight to the nearest hundred of a kilogram.
 So, it would show 0.231
 Is that a problem?
Variables and Computational Accuracy
 Within the limits of recording equipment, it is up to the
investigator to determine the degree of accuracy appropriate
to the problem at hand
 If you want to buy a tomato, 0.00038 kilogram is not
important.
 What if you would like to buy gold?
 1 kg tomato is 1.20 TL. So, 0.00038 kg is 0.000456 TL
 1 g Gold is 101 TL. So, 0.00038 kg is 3.838 TL
Variables and Computational Accuracy
 In Psycohology, we also need to be very carefull in
computational Accuracy.
 If a psychologist works on an theoretical construct which is not
directly related to individuals’ wellbeing, accuracy will not be
that important
 On a paper-pencil attitude measure, it will not be important if a
participant rate his/her favorability toward an attitute object as 7 while
his/her true attitude is 8
 What about intelligence, apptitude, or skills?