Patterns in Data - Supercomputing Challenge

Download Report

Transcript Patterns in Data - Supercomputing Challenge

Data Analysis
UNLOCKING THE SECRETS HIDDEN
IN YOUR DATA
Part 3
How to Calculate an Integral
 Taking the data in columns A and
B from an earlier example with
derivatives. Solve for distance
traveled
 Mathematically:


y = position
t = time
y   vdt
 Enter zero in C2
 Enter in C3 “=C2+B3*(A3-A2)
 Duplicate formula through row 12
This integral operation
can be thought of as a
running sum or the
summation of the area
under the curve
Integration of Velocity
Integration of Velocity
40
Velocity
35
Distance
25
Distance (miles)
Velocity, (mph)
30
20
Distance is the summation
under the velocity curve
15
10
5
0
0
2
4
6
Time (hrs)
8
10
12
Log, Semi-log Plots
 Plotting some functions log-log can make them a
straight line function. This makes it easier to do a
regression fit to the data. The distance traveled for
the constant acceleration problem becomes a
straight line as shown below
Log-Log plot of Constant
Accleration
120
100
80
60
40
20
0
Acceleration
Velocity
Distance
0
5
10
Time (secs)
15
Distance (meters), Velocity
(meters/sec), and Acceleration
(meters/sec2)
Distance (meters),
Velocity (meters/sec),
and Acceleration
(meters/sec2)
Constant Acceleration
1000
100
Acceleration
Velocity
10
Distance
1
1
10
Time (secs
Log Plot Exercise of Constant Acceleration
 Open Excel and label column A Time, B Acceleration,
C Velocity and D Distance. Then enter time from 0 to
10 in increments of 1.
 Enter in B2 through B12 a constant acceleration of
two.
 Integrate the acceleration to get velocity. Enter a
zero in C2. Then enter the formula =C2+B3*(A3-A2)
in C3. Duplicate the formula down to C12.
 Integrate the velocity to get distance. Enter a zero in
D2. Then enter the formula =D2+C3*(A3-A2) in D3.
Duplicate the formula down to D12.
Log Plot Exercise (continued)
 You should now have the
following in Excel:
 Highlight Columns A and D and
do a scatter plot.
 There are two ways to do a loglog plot. First, highlight the x axis
and the y axis and put a check in
the box for log plot.
 Second is to take the log function
of columns A (=LOG10(A3) ) and
D and plot the result.
Frequency Analysis
 Frequency analysis can identify repetitive patterns
in data.

Frequencies, cycles or periods
 For this example , we will use the hare-lynx predator
prey data
Hare Lynx Predator Prey Data
Hares, Lynx (Thousands)
160
140
120
100
80
Hares
60
Lynx
40
20
0
1845
1855
1865
1875
1885 1895
Year
1905
1915
1925
1935
Frequency Analysis, the FFT
 There is clearly a cyclical pattern to the data
 Roughly, the cycle can be calculated by counting the peaks, 9
or 10, and dividing by the years, 90. This gives a 9 to 10 year
cyclical pattern
 A more accurate way to calculate the period is to use
the Fast Fourier Transform or FFT


Transforms the data into “frequency” space
Identifies patterns with respect to frequency or periods of time
FFT of Hare/Lynx Data
 See the following exercise for calculating the FFT of
the Hare/Lynx data in Excel
 A period of about 10 yrs is shown by the frequency
analysis
Frequency Analysis
22
20
18
Magnitude
16
14
12
10
Hare FFT Magnitude
8
Lynx FFT Magnitude
6
4
2
0
0
2
4
6
8
10 12 14 16 18 20 22 24 26 28 30
Period
FFT Exercise Instructions
 SETUP -- First check to see if Excel has the Data
Analysis ToolPak installed. Pull down the Tools
menu and see if Data Analysis is an option on the
menu. If not, go to the Tools menu and click on
Add-Ins. Select the Analysis ToolPak for
installation. You may need your original Office CDRom to install the ToolPak. In the latest Office, the
installation is a bit different. Select the big Office
button in the upper left corner of the Excel window
and select Excel Options at the bottom of the
menu. Select Add-Ins and the find the Analysis
ToolPack for installation.
FFT Exercise – Step 1
 1. Import the Hare/Lynx data into the Excel
Spreadsheet. This is done by:







Selecting the Data menu and then Get External Data.
Select From Text and then change the filter to all files.
Select the hare-lynx.dat file
Select Delimited and click on Next.
Select Tab and click on Next.
Select General column data format and click on Finish.
Click on the A2 box for where you want to put the data and
click on OK.
Label the columns Date, Hares, Lynx.
FFT Exercise – Step 2
 Now setup the data for the FFT. The FFT in Excel
can only operation on powers of 2 (2n). We have 91
data points. This is between 64 (26) and 128 (27). We
could just use the first 64 data points, but one of the
tricks in data sets that are not exactly powers of 2 is
that you can just add zeros to fill in the data to make
the series a power of 2.


So add zeros in columns A, B, and C from row 93 to 129.
Then label
D column header: Hare FFT Complex
 E column header: Lynx FFT Complex


and make the columns wider.
FFT Exercise – Step 3
 Run the FFT on column B.
 Select Tools->Data Analysis->Fourier Analysis (Data>Data Analysis->Fourier Analysis in the latest Excel).
 Set the input data range to B2:B129 and the output to
D2:D129.
 Repeat for column C and put the output data in column E.
The output is in complex format with both real and
imaginary components.
FFT Exercise – Step 4
 Now we want the magnitude of the FFT instead of
the complex value.



Label column H, Hare FFT Magnitude and column I, Lynx FFT
Magnitude.
First we need to multiple the result by 2/number of samples
(ns). In this case, ns is 128. We use the Excel intrinsic IMABS to
get the magnitude of the complex number (note: this is the
sqrt(real2+imag2) in vector arithmetic). So in H2, enter the
formula “=2/128*IMABS(D2)”.
Copy this formula to H128 and over to column I.
FFT Exercise – Step 5
 Next we need to calculate the x axis values so we can plot
the results.



Label column F, FFT Frequency. The first frequency is always zero, so we
put zero in F2. What this means is that the first value in the FFT is the
average value for the data. So 62.9 is the average number of hares and 41
is the average number of Lynx.
Then to calculate the frequency, we use the formula Fs/ns where Fs is the
sampling frequency, roughly 1 sample per year in this case and ns is the
number of samples or 128 in this example. So cell F3 is 1x1/128, cell F4
would be 2x1/128 and so on.
We can enter the series by selecting Edit->Fill->Series (In the more
recent Excel, select Home-> (in Editing)->Series) while in cell F2.
Select column and enter 1/128 or .0078125 for the step value and 1 for
the stop value.
FFT Exercise – Steps 6 and 7
 Step 6. To fill in the period column, first label column G,
FFT Period. In G3 enter “=1/F3”. Copy this formula down
through row 129.
 Step 7. Plotting the data.




Select columns G, H, and I by clicking on the G, H, and I (hold down
the control key while selecting the additional columns).
Now insert a chart->scatter plot.
Edit the data source to include only through row 66. If you look at
the values in H65 and H67 and I65 and I67, you will note that the
data repeats and it repeats for all the values after row 66. This is a
characteristic of FFTs and is why you should only plot half of the
data.
Format the plot to look like the next slide.
FFT Exercise Results