Packing Densities of Permutations

Download Report

Transcript Packing Densities of Permutations

Jan. 29
“Statistics” for one quantitative variable…
Mean and standard deviation (last week!)
“Robust” measures of location (median and its friends)
Quartiles, IQR, five-number summary, Box plots
Percentiles
Transforming data…
Rescale:
Y = c times X
Recenter:
Y = X plus a
other transformations
adding variables to each other
Standardizing data…
Population
vs.
Sample
NH polls, 1/26/04 - errors
Errors from 1/26 NH polls
12
10
8
6
4
2
0.1
0.08
0.06
0.04
0.02
0
-0.02
-0.04
-0.06
-0.08
-0.1
0
A
statistic
anything that can be computed from data.
is
STATISTICS of a single quantitative variable
MEAN
MEDIAN
QUARTILES ( Q1, Q3 )
Five-number summary
Boxplots
Interquartile range
PERCENTILES / QUANTILES / FRACTILES
STANDARD DEVIATION
VARIANCE
Statistics of one variable…
Median --- middle value
(when values are ranked, smallest to
largest)
(or, average of two middle values)
“Robust”
Trimmed mean
Midmean
Geometric mean
“RMS mean”
Number of Colleges
1
2
1
2
12
1
1
1
9
1
1
5
1
7
8
6
1
1
10
1
5
5
7
8
1
6
1
10
4
1
1
10
10
5
7
7
1
5
14
8
1
6
1
1
5
8
1
14
1
1
5
6
6
7
5
13
14
12
5
7
1
8
1
12
12
6
9
8
7
1
8
6
Number of Colleges
1
1
2
6
8
12
1
1
4
6
8
12
1
1
5
6
8
12
1
1
5
6
8
13
1
1
5
6
8
14
1
1
5
7
8
14
1
1
5
7
9
14
1
1
5
7
9
1
1
5
7
10
1
1
5
7
10
1
1
5
7
10
1
1
6
7
10
1
2
6
8
12
Number of Colleges
1
1
2
6
8
12
1
1
4
6
8
12
1
1
5
6
8
12
1
1
5
6
8
13
1
1
5
6
8
14
1
1
5
7
8
14
1
1
5
7
9
14
1
1
5
7
9
1
1
5
7
10
1
1
5
7
10
1
1
5
7
10
1
1
6
7
10
1
2
6
8
12
Mean vs. Median
Large tails affect the mean more than the median.
So:
Right-skewed distribution  Mean right of median
Left-skewed distribution  Mean left of median
Colleges – Datadesk histogram
median —
5
mean —
5.36
salaries
median —
mean —
60,000
106,875
So, which measure of “center” is best?
All the measures agree (roughly) when the distribution is
symmetrical
Mean has attractive mathematical properties
Also, the mean is related to the total, if that’s what you care about
Median may be more “typical” when the distribution is nonsymmetrical
A measure is “robust” if it works reasonably well under a wide
variety of circumstances
Medians are robust
Computing percentiles
To calculate 20-th percentile:
Rank the values from smallest to largest
Compute 20% of n…
20% of 72 = 14.4
Count off that many values (from lowest)…
The value at which you stop is the 20-th percentile.
What if you stop between values ?
Number of Colleges
1
1
2
6
8
12
1
1
4
6
8
12
1
1
5
6
8
12
1
1
5
6
8
13
1
1
5
6
8
14
1
1
5
7
8
14
1
1
5
7
9
14
1
1
5
7
9
1
1
5
7
10
1
1
5
7
10
1
1
5
7
10
1
1
6
7
10
1
2
6
8
12
QUARTILES
Lower quartile (Q1) = 25-th percentile
Upper quartile (Q3) = 75-th percentile
( What’s Q2 ? )
INTERQUARTILE RANGE ( IQR ) = Q3 minus Q1
Five-number summary
—
maximum (or, say, 95 %ile)
—
Q3
—
—
median
Q1
—
minimum (or, say, 5 %ile)
Linear Transformations
If you MULTIPLY or DIVIDE a variable by a constant…
Y = c times X
Y=X/c
then…
measures of center are multiplied or divided by c
measures of spread are multiplied or divided by |c|
If you ADD or SUBTRACT a constant from a variable…
Y=X+a
Y=X–a
then…
measures of center are increased (decreased) by a
measures of spread are UNCHANGED.
More transformations
ADDING VARIABLES:
W = X + Y
Mean(W) = Mean(X) + Mean(Y)
Standard Deviation of (W) — anything can happen
OTHER TRANSFORMATIONS:
Y = X squared ?
Y = log(X) ?
…NO RELIABLE RULES for mean
or std. dev.
Standardized Variables
Write
x
and S for mean, standard deviation of X
Then form transformed variable:
Z = (X -
x
) / S
Then…
mean (Z)
= 0
std dev (Z) = 1
Z answers the question: How many standard deviations is this value
above (or below) the mean?