Transcript Slide 1

What I Learned
About Assessment
From the AP Program
Dan Kennedy
Baylor School
Houston AP Teachers Meeting
November 11, 2006
True Confession of a
Veteran Mathematics
Teacher:
For many years I never thought much about
assessment.
I graded my students in what I thought was
an appropriate variety of ways:
Tests … Quizzes … and Homework.
This model had stood the test of time.
In 1986 I was invited to
become a member of the
AP Calculus Test
Development Committee.
From 1990 to 1994 I
would serve as chair.
My experience with this group changed
my views of assessment forever.
I had already learned one important fact
about classroom assessment merely by
teaching an AP course:
It changes the entire classroom dynamic
when the teacher honestly does not
know what will be on the test.
The teacher has no other
option but to teach the
students how to think for
themselves!
Why students don’t think on tests:
•Thinking takes time.
•Thinking is only necessary when you cannot
do something “without thinking.”
•If you can do something without thinking,
you can do it very well.
•Students who can do something very well
have been well-prepared.
•Therefore, if you prepare them well, your
students will proceed through your
tests without thinking!
Were AP Calculus
exams predictable?
1987 BC Exam:
1.
2.
3.
4.
5.
6.
Differential equation
Implicit Differentiation
Area/volume
Series
Particle problem
Theory problem (stretch)
Of course, this was just one exam.
But there were others like it.
But if we tried to
change anything,
teachers would
notice.
Then, in AP workshops
all over the country,
teachers would find
themselves uttering to
AP consultants the words
they dreaded most when
spoken by their students:
Will this be
on the test?
And why should teachers NOT ask that
question?
It is how the game is played.
•We show the students how to do math.
•We let them practice at it for a while.
•Then we give them a test to see how
well they can mimic what we did.
The game is won and lost for BOTH of us
on test day.
This was just another example of the
educational paradigm that was leading
my student not to think on tests!
But how can teachers change the game if
we want our students to succeed?
Teachers have one
secret weapon:
We define what it
means to succeed.
We control the
grade!
Something I learned about assessment
from the AP program:
It is perfectly OK to scale grades!
AP Grade Conversion Chart
Calculus AB
Composite
AP Grade
Score Range*
75−108
5
58−74
4
40−57
3
25−39
2
0−24
1
*The candidates' scores are weighted
according to formulas determined by the
Development Committee to yield raw
composite scores; the Chief Faculty
Consultant is responsible for converting
composite scores to the 5-point AP scale.
75%
=5
At our school, 75% is not a good grade. In
fact, 65% is a minimal pass.
Is this reasonable? Think about it.
•The all-time NBA record for field goal
percentage in a season is 72.7%.
•The all-time record batting average for
major league baseball is .440 (44%).
•A salesperson who makes a sale on 75%
of first contacts is a genius.
So how can we expect 75% success from
someone who is just learning?
If the AP exam were constructed so that
the low-to-average student could get
75% of the maximum points,
(a) it wouldn’t be much of a test, and
(b) the distribution would be skewed
rather than normal.
99
92
82
•
•
71
•
30
•
20
75
•
93
An Important Disclaimer:
Scaling grades is not about building selfesteem.
Scaling grades is about teaching
mathematics.
Assessment should support your efforts to
teach your students mathematics.
It should not get in the way.
ClrHome:FnOff
PlotsOff :ClrTable:ExprOff
6 Xmin:100 Xmax
0 Ymin:124 Ymax
0 Xscl:0 Yscl
Input "RAW SCORE: ",A
Input "CURVED TO: ",B
Input "RAW SCORE: ",C
Input "CURVED TO: ",D
(B−D)/(A−C) M
"round(MX+B−AM,0)" Y1
IndpntAsk
DispGraph
Text(1,1,"TRACE OR USE TABLE")
Text(7,1,"TO ENTER RAW")
Text(13,1,"SCORES.")
Scaling grades on the TI-84 Plus
Some things that ETS worried about that
I didn’t:
• r-biserial
•Content validity
•Speededness
•True score
•Grading rubrics
r-biserial (r-bis)
“A correlation coefficient relating
performance on a test question and
performance on the measure used as a
criterion. It is an index of discrimination
measuring the extent to which examinees
who score high on the measure used as
the criterion tend to get the question
right and those who score low tend to get
it wrong.”
1969 Multiple-choice question #26:

1
0
x  2 x  1 dx is
2
1
(A)  1
(B) 
2
(E) none of the above
The answer is (C).
1
(C)
2
(D) 1
AB Stats:
A 3%
B 57%
C 7%
D 3%
E 20%
BC Stats:
A 1%
B 70%
C 11%
D 2%
E 9%
Projected Chimpanzee Stats:
A 20%
B 20%
C 20%
D 20%
E 20%
Correct responses to problem #26:
AB 7%
BC 11%
Chimps 20%
Content Validity
“Validity is the extent to which a test
measures what it is supposed to
measure. The content validity of an (AP)
test is the extent to which the content
of the test represents a balanced and
adequate sampling of the universe of
content in which the test is intended to
measure achievement.”
The AP Calculator Experiment (1983-84)
In 1983 the AP Calculus
Committee decided to
allow (but not require)
the use of scientific
calculators on the AP
Calculus examinations.
This was not to be a very happy debut
for technology on the AP stage.
AP readers found that students were
losing points on the free-response section
because of calculator misuse.
The calculators affected the scores.
But calculators were not being tested!
This compromised the content validity.
The committee had two choices:
1. Forbid calculators and test as usual;
2. Require calculators and alter the test.
They chose to forbid the calculators.
One of my Precalculus tests from 1990:
Note the emphasis
on computation.
Note that there is
nothing here to
suggest that any
of this stuff is
worth knowing!
A recent test on the same functions:
PRECALCULUS
TEST 4
1.
2.
3.
4.
We deposit $5000 in an account earning 6.8% annual interest. Find the worth of the account in
10 years if the interest is compounded
a) annually.
b) monthly.
c) daily.
7.
A bacteria population grows according to the exponential model
A(t) = 150 e0 .12t , where A(t) is the population after t days.
Find all real numbers x which satisfy the following equations:
a) x  log 5 13
b) 7 x  126
c) log x  log( 2 x  1)  log 21
x2
 27
d) 3
e) log x 16  2
f) log(log x) = 0
3
a) What is the initial population at time t = 0?
b) How many days will it take for the population to triple in size?
c) Will this model be valid for arbitrarily large values of t ? Explain.
2
Find all three roots of the equation x 2  x . You can find the first one graphically, but the
other two (complex) must be found algebraically.
8.
3x
a) Find the vertical asymptotes of the graph of y = 2
.
x 4
b) Find the horizontal asymptote of the graph of the function in (a).
b) Give an example of a rational function that has a vertical asymptote of
x = 5 and a horizontal asymptote of y = 1.
A certain rumor spreads through a small town so that the proportion of the population that has
et
heard the rumor after t days is given by the formula P(t)  t
. Consider the graph of the
e 7
function in answering the following questions.
a) What proportion of the population has heard the rumor at time t = 0?
b) During what day will 90% of the population come to have heard the rumor?
c) During what day does the rumor seem to be spreading the fastest?
Let f (x)  log(x  3) and g(x)  2 x. Find:
9.
a) the domain of f.
b) the domain of g.
c) the range of f.
d) the range of g.
e) the inverse function of g.
f) the vertical asymptote of f.
g) the horizontal asymptote of g.
h) Does f have a horizontal asymptote?
5.
6.
A single function F satisfies all of the following properties:
a) F(1) = 0
b) F(a) + F(b) = F(ab)
c) F(a) – F(b) = F(a ÷ b)
d) F(10) = 1.
Can you guess what the function is by considering its properties?
Which of the following graphs represents:
1) exponential growth?
2) logistic growth?
4) linear depreciation?
5) logarithmic growth?
a)
b)
3) exponential decay?
c)
Still not perfect,
but a better test.
Speededness
“The appropriateness of a test in terms
of the length of time allotted. For most
purposes, a good test will make full use
of the examination period but not be so
speeded that an examinee’s rate of work
will have an undue influence on the
score he receives.”
Allowing for speededness
Exam Format for AP Calculus AB
Exam Format
% of
Number of
Grade Questions
Section I
Minutes
Allotted
Calculator Use
50
Part A
28
55
no calculator
Part B
17
50
graphing calculator required
Part A
3 problems
45
graphing calculator required
Part B
3 problems
45
no calculator
Section II
50
True Score
“A score entirely free of errors of
measurement. True scores are
hypothetical values never obtained in
actual testing. A true score is
sometimes defined as the average score
that would result from an infinite series
of measurements with the same or
exactly equivalent tests, assuming no
practice effect or change in the
examinee during the testings.”
Why teachers don’t need to worry about
true score:
We can assess our students all year long!
The more often the better.
Sorry, kids.
Yessss!
AP Calculus Grading Rubrics
AP® CALCULUS AB
2004 SCORING GUIDELINES
Question 1
If the AP readers
can give partial
credit fairly to
250,000 students, I
ought to be able to
do it for my own
students.
In AP Calculus, I
can even use the AP
rubrics to do it.
Traffic flow is defined as the rate at which cars pass through an intersection, measured in cars
per minute. The traffic flow at a particular intersection is modeled by the function F defined by
t
F (t )  82  4sin   for 0  t  30,
2
where F(t) is measured in cars per minutes and t is measured in minutes.
(a) To the nearest whole number, how many cars pass through the intersection over the 30minute period?
(b) Is the traffic flow increasing or decreasing at t = 7? Give a reason for your answer.
(c) What is the average value of the traffic flow over the time interval 10  t  15? Indicate
units of measure.
(d) What is the average rate of change of the traffic flow over the time interval 10  t  15?
Indicate units of measure.
(a)

30
0
F (t )dt  274 cars
(b) F(7)  1.872 or  1.873
Since F(7)  0 , the traffic flow is decreasing
at t = 7.
(c)
1 15
F (t )dt  81.899 cars / min
5 10
(d)
F (15)  F (10)
 1.517 or 1.518 cars / min 2
15  10
Units of cars / min in (c) and cars / min 2 in (d)
 1 : limits

3 :  1 : integrand
 1 : answer

1 : answer with reason
 1 : limits

3:  1 : integrand
 1 : answer

1 : answer
1: units in (c) and (d)
Copyright © 2004 by College Entrance Examination Board. All rights reserved.
Visit apcentral.com (for AP professionals) and www.collegeboard.com/apstudents (for AP students and parents).
AP Calculus Exams are:
•Designed to test knowledge
•Designed to test cleverness
•Scaled reasonably
•Not made up by the teacher
•Open assessments
•Comprehensive assessments (valid)
•Honest about technology
Two Fundamental Principles:
1. Assess what you value.
2. Value what you assess.
Some problems with traditional tests:
•They assess only a fraction of what we value.
•They depend too much on luck.
•There is often no feedback (as with final exams).
•They are usually taken alone. (Is this what we value?)
•They are usually timed. (Is this a good model for quality work?)
•They are frequently taken under artificial, stressful conditions.
•They are dependent on teacher stimulus.
•They are often devoid of creativity (if students are “prepared”).
•They favor one narrow kind of student performance.
•Success is usually short-term and non-transferable.
•The emphasis in the end is what the student can NOT do.
•They can inhibit further learning.
Some assessment strategies I like:
•Assess what you value and value what you assess!
•Assess often, with different kinds of assessments.
•Give meaningful and prompt feedback.
•Give partial credit for partially correct work.
•Explain all your expectations to your students from the start.
•Test diligence, knowledge, and cleverness in focused ways.
•Encourage creativity through your assessments.
•Scale grades to control the standard deviation.
•Only fail students who are failures. Keep everyone in the game.
•Encourage collaboration in class and on homework.
•Assess diligence. Find a way to grade homework frequently.
•Try portfolios.
•Remember: This is not about self-esteem. It’s about teaching
mathematics to all your students!
Rebecca
Flake’s
Portfolio
Entry
Rebecca Flake
Students need to hand in a portfolio
of items of their own choosing.
The main point of this assessment is
that they are not responding to a
stimulus from me (as in a test or a
quiz).
My primary directive for student
portfolio entries is this:
Give me evidence of your learning
that I otherwise would not have!
This was my first year to be a peer tutor, and I enjoyed
helping the girls in the dorm a lot. Last night, though, I
finally saw the importance of my peer tutoring. My
roommate came in at 10:00 extremely upset over her
Precalculus test that was the next day. I calmed her
down and told her that I would help her if I could. Carrie,
who had been in the play, had gotten behind in her work,
so she didn’t understand what they were doing. She
showed me the problem. I knew the answer, but I wasn’t
sure how to explain it to her in a way that was not
confusing. I thought about it for a while, and I ended up
trying several approaches (with Clara’s help) that I had
learned in Calculus, until I finally got through to her.
Then I made her work a few problems for me, and she
did them perfectly. She understood! I was so happy to be
able to help her that I had forgotten I was supposed to
be studying for my own Calculus test. She was so happy
she understood that she began to cry. She really began
to cry. It’s great to be able to use the things you have
learned to help other people learn too.
A happy footnote:
Carrie really did
understand.
She scored 93 on the
Precalculus test the
following day – a
personal best for her,
and a full 9 points
above the class
average.
Actress Carrie
E-mail me at:
[email protected]
Or visit the Baylor
School web site at
www.baylorschool.org.
Click on me under
Faculty and link to my
home page.