Transcript Slide 1

Quantitative Methods
of Data Analysis
Bill Menke, Instructor
Natalia Zakharova, TA
Lecture 2
MatLab Tutorial
and
Issues associated with Coding
MatLab Fundamentals
Most important Data Types
Numerical:
Scalars – single value
Vectors – Column or row of values
Matrices – two-dimensional tables of values
Text:
Character string
Scalars
A number you enter
a = 1.265;
A predefined number
b = pi;
The result of a calculation
c = a*b;
Vectors
MatLab can manipulate both
column-vectors
and
row-vectors
But my advice to you is
only use column-vectors
1.4
2.3
0.1
9.1, 7.1, 4.2, 8.9
Because its so easy to introduce a bug by doing an operation on
one that should have been done on the other.
Use the transform operator ‘ to immediately convert any rowvector that you must create into a column-vector
Transform Operator
Swap rows and columns of an array, so that
1
2
3
4
becomes [ 1, 2, 3, 4 ]
(and vice versa)
Standard mathematical notation: aT
MatLab notation: a’
Vector
A vector you enter
a = [1.88, 7.22, 5.31, 7,53]’;
Result of a calculation
b = 2 * a;
The result of a function call
c = sort(a);
Note
immediate
conversion to
a columnvector
Matrix
A matrix you enter
A = [ [1,2,3]', [4,5,6]', [7,8,9]' ];
That’s the matrix
1 4 7
2 5 8
3 6 9
by the way …
Result of a calculation
B = 2 * A;
The result of a function call
C = zeros(3,3);
Character strings
You type in a quoted sequence of characters:
s = ‘hi there’;
Occasionally, the result of a function call:
capS = upper(s);
That’s ‘HI THERE’,
by the way …
arithmetic
a = 2;
a scalar
b = 2;
a scalar
c = [1, 2, 3]’; a column-vector
d = [2, 3, 4]’; another column-vector
M = [ [1,0,0]', [0,2,0]', [0,0,3]' ];
e = a*b; a scalar
f = c’*d; the dot-product, a scalar
g = M*d; a column-vector
h = d’*M*d; a scalar
1 0 0
0 2 0
0 0 3
Normal rules of linear
algebra apply, which
means that the type of
the result depends
critically on what’s on
the r.h.s. – and on its
order! Lot’s of room
for bugs here!
Element access
Suppose
A = [ [1,2,3]', [4,5,6]', [7,8,9]' ];
1 4 7
2 5 8
Then A(2,3) is Arow=2,col=3 = 8
b = A(2,3); sets b to 8
A(2,3) = 10; resets A(2,3) to 10
Then A(:,2) is the second column of A
3 6 9
1 4 7
2 5 10
3 6 9
4
b = A(:,2);
5
6
And A(3,:) is the third row of A
c = A(3,:); c=[3, 6, 9 ];
3
d = A(3,:)’;
6
9
but we agreed, no row vectors
More on :
Suppose
A = [ [1,2,3]', [4,5,6]', [7,8,9]' ];
1 4 7
2 5 8
3 6 9
Then A(1:2,1:2) extracts a range of columns
1 4
2 5
Note that a quick way to make a vector with
regularly-spaced elements is:
0.01
dx = 0.01;
0.02
N=100;
…
0.99
t = dx*[1:N]’;
1.00
Logical functions
MatLab assign TRUE the value 1 and FALSE the value 0,
so
( 1 > 2 ) equals 0
( 1 < 2 ) equals 1
a = [1, 2, 3, 4, 5, 4, 3, 2, 1]’;
b = (a>=4); [0, 0, 0, 1, 1, 1, 0, 0, 0]’;
sum( (a>=4) ); is the number of elements in the vector a
that are equal to or greater than 4
Logical tests
Blocks of MatLab code that are executed
only when a test is true.
One handy use is turning on or off bits of
code intended primarily for debugging
Here its gets plotted
doplotone=1;
if (doplotone)
plot(t,d);
end
Here it doesn’t
doplotone=0;
if (doplotone)
plot(t,d);
end
To Loop or Not to Loop
a=[1, 2, 3, 4, 3, 2, 1]’;
b=[3, 2, 1, 0, 1, 2, 3]’;
N=length(a);
Dot product using
MatLab syntax
c = a*b;
Dot product using loop
c = 0;
for i = 1:N
c = c + a(i)*b(i);
end
You should avoid loops except in
cases where
No MatLab syntax is available to
provide the functionality in a
simpler way
Available MatLab syntax is so
inscrutable that a loop more clearly
communicates your intent
A Tutorial
using the Neuse River Hydrograph
discharge
rain
Rain falls and the river rises, the discharge
quickly increases
After the rain, the river falls, the discharge
slowly decreases
time
time
So, is the river
more often
falling than
rising ?
What would constitute an
appropriate analysis ?
Find, for the 11 year period, the percent of days
that the discharge is increasing*, compare it to
50%.
Make a histogram of the rate of increase and
decrease of discharge and see whether it is
centered around zero or some other number.
* Rising today if today’s discharge minus
yesterday’s discharge is positive.
Steps
Import the Neuse hydrograph data
Convert units what we’re most familiar with
Plot discharge vs time, examine it for errors
Compute discharge rate (today minus yesterday)
Plot rate vs time, examine it for errors
Count up % of days rate is positive
Output the % of days
Compute histogram of rates and plot it
Tricks: work first with a subset of the data
Finding and Using Documentation
MatLab Web Site is one place that your can get a description of syntax, functions, etc.
Example 1: the LENGTH command
Can be very useful in finding exactly what you
want if you’ve only found something close to what
you want!
Example 2: the SUM command
Some commands
have long,
complicated
explanations. But
that’s because they
can be applied to
very complicated
data objects. Their
application to a
vector is usually
short and sweet.
...
(two more pages below)
Coding Advice
Advice #1
Think about what you want to do before
starting to type in code!
Block out on a piece of scratch paper the
necessary steps
Without some forethought, you can code for
a hour, and then realize that what you’re
doing makes no sense at all.
Advice #2
Sure, cannibalize a program to make a new
one …
But keep a copy of the old one …
And make sure the names are sufficiently
different that you won’t confuse the two ,,,
Advice #3
Be consistent in the use of variable names
amin, bmin, cmin, minx, miny, minz
guaranteed to cause trouble
Don’t use variable names that can be too
easily confused, e.g xmin and minx.
(Especially important because it can interact
disastrously with MatLab automatic
creation of variables. A misspelled variable
becomes a new variable).
Advice #4
Build code in small section, and test each
section thoroughly before going in to the
next.
Make lots of plots.
Advice #5
Test code on smallish simple datasets before
running it on a large complicated dataset
Build test datasets with known properties. Test
whether your code gives the right answer!
Advice #6
Don’t be too clever!
Inscrutable code is very prone to error.
Advice #7
use comments to communicate the BIG PICTURE
Which set of comments gives you the most sense of what’s going on?
% c is the dot product of a and b
c = 0;
for i = 1:N
c = c + a(i)*b(i);
end
% set c to zero
c = 0;
% loop from one to N
for i = 1:N
% add a times b to c
c = c + a(i)*b(i);
% end of the loop
end
Advice #8
BUGS – DON’T MAKE THEM
(an ounce of prevention is worth a pound of cure)
Practices that reduce the likelihood of bugs are
almost always worthwhile, even though they may
seem to slow you down a bit …
They save time in the long run, since you will
spend much less time debugging …
By the way, cutting-and-pasting code, especially when it
must them be modified by changing variable names, is a
frequent source of bugs, even though its so tempting …