Document 7573716

Download Report

Transcript Document 7573716

Statistics & Data Analysis
Course Number
Course Section
Meeting Time
B01.1305
31
Wednesday 6-8:50 pm
Midterm Review
Midterm Format
 Open book and open notes
• No solution guides or other resources are permitted
 A scientific calculator will be required
 All questions will be short answer
 Entire class period is available for exam
Professor S. D. Balkin -- March 12, 2003
-2-
Exam Coverage
 Chapter 1
• Understand reasons for statistics
 Chapter 2
•
•
•
•
Distinguish between qualitative and quantitative variables
Describe and interpret plots of data
Understand and calculate measures of center
Understand and calculate measures of variation
Professor S. D. Balkin -- March 12, 2003
-3-
Exam Coverage
 Chapter 3
• Understand different sources of probabilities
• Understand and use basic principles of probability
• Addition
• Compliments
• Multiplication
• Calculate conditional and unconditional probabilities
• Understand, use and determine statistical independence
• Be able to construct and interpret probability tables and trees
 Chapter 4
• Understand probability distributions
• Calculate the expected value and standard deviation of a probability
distribution
Professor S. D. Balkin -- March 12, 2003
-4-
Exam Coverage
 Chapter 5: Some Special Probability Distributions
• Calculate probability of an event using
• Counting methods
• Binomial distribution
• Normal distribution
 Chapter 6: Random Samples and Sampling Distributions
• Understand and identify sources of sample bias
• Understand difference between the distribution of a summary statistic
and distribution of a population
• Identify the sampling distribution of the sample mean
• Understand the use of the Central Limit Theorem
• Interpret a normal probability plot
Professor S. D. Balkin -- March 12, 2003
-5-
Exam Coverage
 Chapter 7: Point and Interval Estimation
• Understand unbiased and efficient estimators
• Calculate and interpret confidence intervals
• For population mean with standard deviation known
• For population proportion
• For population mean with standard deviation unknown
• Determine sample sizes for a given confidence level and tolerance
width
• Understand t-distribution
• Understand key assumptions underlying confidence interval methods
Professor S. D. Balkin -- March 12, 2003
-6-
Practice Problems with Answers in Book














2.26
3.35
3.36
3.46
3.47
3.48
3.53
3.54
3.55
3.59
3.60
3.63
3.64
3.65














3.66
3.67
3.68
4.35
4.36
5.37
5.38
5.40
5.41
6.29
6.35
6.36
6.37
7.41
Professor S. D. Balkin -- March 12, 2003








7.42
7.47
7.48
7.58
7.59
7.60
7.76
7.77
-7-
Interpretation Review
•
•
•
•
•
•
•
•
•
•
•
•
•
Mode: value or category with the highest frequency in the data
Median: middle value when the data are arranged from lowest to highest
Mean: sum of measurements divided by the number of measurements
Variance: squared deviations from the mean
Empirical Rule:
IQR: 75th percentile – 25th percentile
Random Variable: quantitative result from an experiment that is subject to random
variability
Expected Value: probability-weighted average of possible values
Permutations: number of sequences of r symbols taken k at a time
Combinations: number of subsets of r symbols taken k at a time
Central Limit Theorem: For any population, the sampling distribution of the sample
mean is approximately normal if the sample size is sufficiently large.
Interval estimate: states the range within which a population parameter probably
lies
95% Confidence interval:
•
About 95% of similarly constructed intervals will contain the parameter being estimated
Professor S. D. Balkin -- March 12, 2003
-8-
Question #1
 Fortune magazine publishes a list of the world's billionaires each year. The
1992 list includes 233 individuals. Describe this distribution of wealth.
Why do you think the distribution is the way it is (Hint: is this a
representative sample)?
150
100
0
50
Frequency
200
Histogram of wealth
0
10
20
30
40
wealth
Professor S. D. Balkin -- March 12, 2003
-9-
Question #2
 As a marketing consultant, you observed 50 consecutive shoppers at a
grocery store, and recorded how much money each shopper spent in the
store.
 (a) Create and interpret a histogram of these data.
 (b) Create and interpret a stem-and-leaf plot of these data.
 (c) Create and interpret a boxplot of these data.
 (d) Provide your client with an executive summary of your analysis.
Professor S. D. Balkin -- March 12, 2003
- 10 -
Question #3



A narcotics enforcement unit works with customs officers at an airport that serves international
travelers on a route that has plausible links to the drug trade. This enforcement unit has developed a
smuggler profile that it uses to initiate full searches of people who meet the profile. These profiles
typically require meeting a number of conditions such as (a) male under 40, (b) traveling alone, (c)
loose clothing, and so on.
Fully 100% of the travelers who meet the profile were searched, and 10% of those who did not meet
the profile were searched. After collecting considerable data, these figures resulted:













Percentage of people who meet the profile:
4%
Percentage of people who meet the profile and
then are found to have illegal drugs
35%
Percentage of people who do not meet the
profile and then are found to have
illegal drugs
3%
(a)
Based on these figures, what percentage of travelers on this particular route is carrying illegal
drugs?
(b)
What percentage of the drug-carrying travelers will be captured by this procedure? Assume
that all drug carriers who are searched will be captured.
(c)
Given that a traveler is carrying illegal drugs (whether captured or not), what is the probability
that this person will meet the profile?
Professor S. D. Balkin -- March 12, 2003
- 11 -
Question #4












A restaurant has collected data on its customers’ orders and had estimated probabilities
about what happens after the main course. It was found that 20% of the customers had
dessert only, 40% had coffee only, and 30% had both dessert and coffee.
(a) Draw a probability tree for this situation
(b) Find the probability of the event “had coffee.”
(c) Find the probability of the event “did not have dessert”
(d) What percentage of customers will have “neither coffee nor dessert”?
(e) What percentage of customers will have “coffee OR dessert”?
(f) Are the events “had coffee” and “had dessert” mutually exclusive? How do you know?
(g) Given that a customer had coffee, what is the probability that the same customer had
dessert?
(h) Are “had dessert” and “had coffee” independent events? How do you know?
(i)
Find the conditional probability of having dessert GIVEN that the customer did not have
coffee
(j)
Find the conditional probability of having dessert GIVEN that the customer did have
coffee
(k) Based on your analyses above, who is more likely to order dessert, a customer who
orders coffee, or one who does not?
Professor S. D. Balkin -- March 12, 2003
- 12 -
Question #5















Acorn is the acronym for Association of Community Organizations for Reform Now.
These data were presented by Acorn to a Joint Congressional Hearing on discrimination in
lending. Acorn concluded, "Banks generally have exhibited a pervasive pattern of lending
practices that have the effect, intended or not, of racial discrimination. Wide disparities in
rejection rates for minority and white applicants, even in comparable income groups, were
found in all SMA's, and at nearly every institution studied."
The data provide are as follows:
Data: bankdata.txt
Number of cases: 20
Variable Names:

Name of bank

MIN = refusal rate for minority applicants

WHITE = refusal rate for white applicants

HIMIN = refusal rate for high income minority applicants

HIWHITE = refusal rate for high income white applicants
Using the data provided and the methods learned in class, write a short argument in support
of or disputing Acorn’s claim that banks have exhibited racial discrimination. Use both
graphics and text to help make you case.
Professor S. D. Balkin -- March 12, 2003
- 13 -
Question #6
 Research on insider traders who were arrested revealed that 38% of them
committed some other white-collar crime.
 What is the probability that of the last 100 arrested insider traders, 30
committed another crime?
Professor S. D. Balkin -- March 12, 2003
- 14 -
Question #7
Here is a table of American households classified by education and income
Education
< 4 years of high school
4 years of high school
1-3 years of college
4+ years of college
(a)
(b)
(c)
(d)
Income Class (thousands of dollars)
<15
15-34
35-49
50+
11,668
7,217
1,909
1,180
8,088
12,417
5,776
4,279
2,626
5,263
3,230
3,173
1,597
5,189
4,334
7,888
What is the probability that a randomly selected household has an
income of at least $50,000.
What is the conditional probability that a household earns over $50,000
given that the householder completed at least 4 years of college?
What is the conditional probability that the householder completed at
least 4 years of college given that the household income is at least
$50,000?
Are the random variables household income and years of education
independent? Why or why not?
Professor S. D. Balkin -- March 12, 2003
- 15 -
Question #8
 Identify a situation relating to your work or business interests
in which statistical sampling might be (or has been) helpful
 (a) Describe the population and indicate how a sample could
be chosen
 (b) Identify a population parameter of interest and indicate
how a sample statistic could shed light on this unknown.
 (c) Explain the concept of the sampling distribution of this
statistic for your particular example.

Professor S. D. Balkin -- March 12, 2003
- 16 -
Question #9
Suppose an investment has the following probabilities associated with levels of profit:
PROFIT PROBABILITY
$300
0.05
$200
0.25
$100
0.35
$0
0.20
-$50
0.10
-$100
0.05
(a) Find the expect return (value) and the risk (standard deviation) of this
investment
(b) Draw / Create the probability distribution (bar chart) for your profit for
investment. Indicate the mean and standard deviation.
(c) If you had a choice between this investment or a $100.00 gift, which would
you prefer and why?
(d) Suppose you had to decide between this investment and one with an
expected return of $150.00 and standard deviation (risk) of $200. What
would your decision depend on?
Professor S. D. Balkin -- March 12, 2003
- 17 -
Question #10
 A city decides to determine the mean expenditures per tourist per visit. A random
sample of 100 finds that the average expenditure is $800. The standard deviation
of expenditures for all tourists is $120.
 A) What is the standard deviation of the mean, given that the standard deviation of
the whole population is $120 and the number of people sampled is 100?
 B) What is a 95% confidence interval for the value of the expenditures per tourist?
Provide an interpretation.
 C) If the city wants to determine the average expenditure within plus or minus
$20, how many people does it need to sample?
Professor S. D. Balkin -- March 12, 2003
- 18 -
Question #11
 In border towns such as Detroit and Buffalo, Canadian coins
frequently end up in business cash registers. Canadian
denominations are identical to U.S. denominations, and the
coins are virtually identical in size, color, and weight. At
present, the exchange rate favors the U.S., and banks
encourage their customers to sort out the Canadian coins.

 A Buffalo bank has been monitoring the deposits of one of its
large customers, a supermarket. The bank has recorded on
45 days the face value of Canadian coins per $100 deposited.
For these 45 days, the average amount was $3.46, with a
standard deviation of $0.52. Give a 95% confidence interval
for the population mean.
Professor S. D. Balkin -- March 12, 2003
- 19 -