Transcript Slides
Football for KMS: NFL ‘01
APRIL 30TH 2008
Abhijit Kumar
Kaijia Bao
Vishal Rupani
Course Instructor: Prof. Hsinchun Chen
Agenda
ABHI
VISHAL
KAI
Data Collection
Client Relations
Final Presentation
Data Cleaning
Statistical Analysis
Final Paper
Data Import
Data Transformation
Data Mining
Objectives
Literature Overview
Conclusion
Knowledge Discovery
Statistical Analysis
Data Mining Techniques
Key Findings
KMS Demonstration
Research Objectives
Pattern identification
Descriptive Statistics
Data Mining Techniques
Prediction
Developing a strategy
Fantasy League
Literature Overview
Moneyball:The Art of Winning an Unfair Game
Michael Lewis
Las Vegas Odds
www.VegasInsider.com
NFL Fantasy League
www.Nfl.com/fantasy
Knowledge Discovery Process
TRANSFORMATION
DATA
Pro-Football
-3 Tables
-40 Columns
-82,346 Rows
Lisa Ordonez
-1 Table
-90 Columns
-50,417 Rows
Dependent
Variables
Play Decision, Intended Player, Play Direction, Yards
Calculated
Variables
GameNum, IsPlayChal, PlayZone, TotalOffTO,
PlayDecision, QtrTimeLeft, HalfTimeLeft,
GameTimeLeft
Independent
Variables
SQL 2005 AS
SQL 2005 IS
Defense, Down, GAP, Halftime Left, Off Ydl, Offense,
Play Zone, QTR, ToGo, Total Off TO
Knowledge Discovery Process
MINING
PROCESSING
TRANSFORMATION
DATA
Pro-Football
-3 Tables
-40 Columns
-82,346 Rows
Lisa Ordonez
-1 Table
-90 Columns
-53,000 Rows
Dependent
Variables
Calculated
Variables
Accuracy
-Lift Charts
-Classification
Matrix
SQL 2005 AS
Independent
Variables
SQL 2005 AS
SQL 2005 IS
Simple
Statistics
-Play Decision
-Intended
Player
-Play
Direction
-Yards
Models
- ID3
- Neural
Networks
MS Excel 2007
Dependency Network
Dependency Network
Intended Player: Statistics
Top 3 Intended Players for Passes for the 4
teams that played in the semi-finals
H.Ward (142), P.Burress (121), B.Shaw (44)
T.Brown (143), D.Patten (93), M.Edwards (39)
T.Holt (133), M.Faulk (104), I.Bruce (103)
J.Thrash (107), D.Staley (89), T.Pinkston (83)
Play Direction: Statistics
Direction of Rushes for all plays in 2001
season
Left End
Left Tackle Left Guard
Middle
Middle Right Guard
Right Tackle Right End
Play Direction: Statistics
Direction of Rushes for all plays in 2001
season
Number of Rushes
600
500
400
300
200
100
0
Direction
Yardage: Statistics
Yardage during each down for Pass and Rush
Rushes
Average Yards Covered
Passes
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
1
2
3
4
5
6
7
8
9
10 > 10
Down 1
Down 2
Down 3
1
Yards To Go
2
3
4
5
6
7
8
9
10 > 10
Play Decision: Statistics
Play Decisions for the 4 teams that played in
the semi-finals
Play Decision Type
New England
Philadelphia
Pittsburgh
St. Louis
Kneel
Field goal
1pt extra
0
10
20
30
40
Number of Decisions
50
60
Play Decision: Analysis Overview
Discovery of what environmental and/or
game factors affect play decision
Discovery of football expert knowledge
through data mining
Prediction of play decisions based on game
factors
Play Decision: ID3 Analysis
Play Decision: ID3 Analysis
Play Decision: Accuracy
Rush Accuracy: Lift Chart
Field Goal Accuracy: Lift Chart
Play Decision: Classification Matrix
Play Decision: Key Findings
Football strategy can be discovered through
data, instead of knowledge experts
Top 3 factors affecting decision:
Down, Off Ydl, Time
Accuracy of the models are different
depending on the decision we are trying to
predict
Team specific strategies may be discovered
with more data.
Play Direction: Analysis Overview
Discover team’s strengths and weakness in
their defense and/or offense
Prediction of play directions based on game
factors
Left End
Left Tackle Left Guard
Middle
Middle Right Guard
Right Tackle
Right End
Play Direction: Accuracy
Play Direction: Key Findings (ID3)
Intended Player: Analysis Overview
Discover each team’s favored recipient of a
pass
Prediction of intended player based on game
factors
Intended Player: Lift Chart
Intended Player: Key Findings
There are 400+ intended players
Not enough data to accurately predict
intended players
Not enough data to gain knowledge over
statistical models
Conclusions
INTENDED PLAYERS
PLAY DIRECTION - Insufficient data
- Less accurate
- No knowledge gained
- Enough data to
- Need to increase
PLAY DECISION gain knowledge
sample size
- Accurate
- Gained
Knowledge
Future Direction
Increase sample set
More instances of different scenarios
Incorporate additional information
Pro-football-Reference.com
VegasInsider.com (Odds for favorites)
Extend Analysis
Nested case (Historical performance)
References
Prof. Lisa Ordóñez
Professor in Statistics
Steve Aldrich
Author of Moneyball in Football
About Football
Glossary of terms
Knowledge Discovery Process
MINING
PROCESSING
TRANSFORMATION
DATA
Pro-Football
-3 Tables
-40 Columns
-82,346 Rows
Lisa Ordonez
-1 Table
-90 Columns
-53,000 Rows
Dependent
Variables
Calculated
Variables
Accuracy
-Lift Charts
-Classification
Matrix
SQL 2005 AS
Independent
Variables
SQL 2005 AS
SQL 2005 IS
Simple
Statistics
-Play Decision
-Intended
Player
-Play
Direction
-Yards
Models
- ID3
- Neural
Networks
MS Excel 2007
Research Objectives
Accuracy: Lift Chart Charts
Literature Overview
Analysis: Play Decision
Knowledge Discovery
Analysis: Play Direction
Statistics: Intended Player
Analysis: Intended Player
Statistics: Play Direction
Conclusions
Statistics: Yardage
Future Directions
Statistics: Play Decision
System Design
Backup Slide Section
Data Collection
55,000 rows
90 columns
• Football
Outsiders
• Pro-Football
Initial Dataset
Processing
• Cleaning
• Hierarchy
• Relevance
47,033 rows
30 columns
• Dependent
• Independent
• Calculated
Analysis
Dependent – 4
Independent – 10
Calculated - 9
System Design
NFL KMS
FOOTBALL DATA
Model Building
NFL Season
2001
DB
Testing/ Accuracy
Pattern Analysis
DEFENSE STRATEGY
METRICS
Accuracy
Performance
FIELD STRATEGY
Formations
Substitutions
Play Decisions
Yards Analysis
Yards gained on the play is used as a metric to
measure effort
Discover how environmental and/or game
factors affect player’s efforts
Key Findings: Top 4 environmental factors
Off Ydl
Time
Down
Gap