Transcript Slides
Football for KMS: NFL ‘01 APRIL 30TH 2008 Abhijit Kumar Kaijia Bao Vishal Rupani Course Instructor: Prof. Hsinchun Chen Agenda ABHI VISHAL KAI Data Collection Client Relations Final Presentation Data Cleaning Statistical Analysis Final Paper Data Import Data Transformation Data Mining Objectives Literature Overview Conclusion Knowledge Discovery Statistical Analysis Data Mining Techniques Key Findings KMS Demonstration Research Objectives Pattern identification Descriptive Statistics Data Mining Techniques Prediction Developing a strategy Fantasy League Literature Overview Moneyball:The Art of Winning an Unfair Game Michael Lewis Las Vegas Odds www.VegasInsider.com NFL Fantasy League www.Nfl.com/fantasy Knowledge Discovery Process TRANSFORMATION DATA Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -50,417 Rows Dependent Variables Play Decision, Intended Player, Play Direction, Yards Calculated Variables GameNum, IsPlayChal, PlayZone, TotalOffTO, PlayDecision, QtrTimeLeft, HalfTimeLeft, GameTimeLeft Independent Variables SQL 2005 AS SQL 2005 IS Defense, Down, GAP, Halftime Left, Off Ydl, Offense, Play Zone, QTR, ToGo, Total Off TO Knowledge Discovery Process MINING PROCESSING TRANSFORMATION DATA Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -53,000 Rows Dependent Variables Calculated Variables Accuracy -Lift Charts -Classification Matrix SQL 2005 AS Independent Variables SQL 2005 AS SQL 2005 IS Simple Statistics -Play Decision -Intended Player -Play Direction -Yards Models - ID3 - Neural Networks MS Excel 2007 Dependency Network Dependency Network Intended Player: Statistics Top 3 Intended Players for Passes for the 4 teams that played in the semi-finals H.Ward (142), P.Burress (121), B.Shaw (44) T.Brown (143), D.Patten (93), M.Edwards (39) T.Holt (133), M.Faulk (104), I.Bruce (103) J.Thrash (107), D.Staley (89), T.Pinkston (83) Play Direction: Statistics Direction of Rushes for all plays in 2001 season Left End Left Tackle Left Guard Middle Middle Right Guard Right Tackle Right End Play Direction: Statistics Direction of Rushes for all plays in 2001 season Number of Rushes 600 500 400 300 200 100 0 Direction Yardage: Statistics Yardage during each down for Pass and Rush Rushes Average Yards Covered Passes 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 1 2 3 4 5 6 7 8 9 10 > 10 Down 1 Down 2 Down 3 1 Yards To Go 2 3 4 5 6 7 8 9 10 > 10 Play Decision: Statistics Play Decisions for the 4 teams that played in the semi-finals Play Decision Type New England Philadelphia Pittsburgh St. Louis Kneel Field goal 1pt extra 0 10 20 30 40 Number of Decisions 50 60 Play Decision: Analysis Overview Discovery of what environmental and/or game factors affect play decision Discovery of football expert knowledge through data mining Prediction of play decisions based on game factors Play Decision: ID3 Analysis Play Decision: ID3 Analysis Play Decision: Accuracy Rush Accuracy: Lift Chart Field Goal Accuracy: Lift Chart Play Decision: Classification Matrix Play Decision: Key Findings Football strategy can be discovered through data, instead of knowledge experts Top 3 factors affecting decision: Down, Off Ydl, Time Accuracy of the models are different depending on the decision we are trying to predict Team specific strategies may be discovered with more data. Play Direction: Analysis Overview Discover team’s strengths and weakness in their defense and/or offense Prediction of play directions based on game factors Left End Left Tackle Left Guard Middle Middle Right Guard Right Tackle Right End Play Direction: Accuracy Play Direction: Key Findings (ID3) Intended Player: Analysis Overview Discover each team’s favored recipient of a pass Prediction of intended player based on game factors Intended Player: Lift Chart Intended Player: Key Findings There are 400+ intended players Not enough data to accurately predict intended players Not enough data to gain knowledge over statistical models Conclusions INTENDED PLAYERS PLAY DIRECTION - Insufficient data - Less accurate - No knowledge gained - Enough data to - Need to increase PLAY DECISION gain knowledge sample size - Accurate - Gained Knowledge Future Direction Increase sample set More instances of different scenarios Incorporate additional information Pro-football-Reference.com VegasInsider.com (Odds for favorites) Extend Analysis Nested case (Historical performance) References Prof. Lisa Ordóñez Professor in Statistics Steve Aldrich Author of Moneyball in Football About Football Glossary of terms Knowledge Discovery Process MINING PROCESSING TRANSFORMATION DATA Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -53,000 Rows Dependent Variables Calculated Variables Accuracy -Lift Charts -Classification Matrix SQL 2005 AS Independent Variables SQL 2005 AS SQL 2005 IS Simple Statistics -Play Decision -Intended Player -Play Direction -Yards Models - ID3 - Neural Networks MS Excel 2007 Research Objectives Accuracy: Lift Chart Charts Literature Overview Analysis: Play Decision Knowledge Discovery Analysis: Play Direction Statistics: Intended Player Analysis: Intended Player Statistics: Play Direction Conclusions Statistics: Yardage Future Directions Statistics: Play Decision System Design Backup Slide Section Data Collection 55,000 rows 90 columns • Football Outsiders • Pro-Football Initial Dataset Processing • Cleaning • Hierarchy • Relevance 47,033 rows 30 columns • Dependent • Independent • Calculated Analysis Dependent – 4 Independent – 10 Calculated - 9 System Design NFL KMS FOOTBALL DATA Model Building NFL Season 2001 DB Testing/ Accuracy Pattern Analysis DEFENSE STRATEGY METRICS Accuracy Performance FIELD STRATEGY Formations Substitutions Play Decisions Yards Analysis Yards gained on the play is used as a metric to measure effort Discover how environmental and/or game factors affect player’s efforts Key Findings: Top 4 environmental factors Off Ydl Time Down Gap