Massive Choice Data Co-Chairs: Prasad Naik and Michel Wedel 7 Triennial Choice Symposium
Download
Report
Transcript Massive Choice Data Co-Chairs: Prasad Naik and Michel Wedel 7 Triennial Choice Symposium
Massive Choice Data
Co-Chairs: Prasad Naik and Michel Wedel
7th Triennial Choice Symposium
Wharton Business School
June 13 -17, 2007
Impetus for “Massive” Data?
Technological advances (Internet, RFID)
Computing advances
Methodological advances
Detailed data
Large sample, N
Many variables, p
Long time-series, T
Several products and SKUs, K
Different Types of Massive Data
Structured Data
Scanner panel, Loyalty card, CRM, Click-stream
Unstructured Data
Text data (e.g., product reviews, blogs, complaints)
Images, Music
Emerging Data Types
RFID, Video, social networks, recommendations,
auctions, games, eye tracking, semantic Web 2.0
Is the data set just getting bigger?
What is the qualitatively difference?
Sometimes Nothing
Just a scale up problem
But the bigger size makes it harder to analyze in real-time
Sometimes Everything
Empty space phenomenon
Statistical Inference, diagnostics, sparseness
Visualization becomes tricky when p > 10
Managers and Models
Managers need
real-time computation
decision optimization
Man – Machine engagement
managerial inputs plus data analyses
Models need to be both
Simple for quick computation (real-time decisions),
Complex for realism in assumptions
How?
The notion of “Workbench”
Model averaging, forecast combination
Estimation and Computation
Estimation methods
Identified promising approaches for massive data analysis
Inverse regression methods
Regularization techniques (e.g., Lasso)
Particle filters
Logistic regression or Support Vector Machines
Computation power
Grid computing is needed
waiting for fast computer is not an option
Gap between industry and practice
Google has 2 Million processors
Directions and Action Points
Incentives for academics?
Industry-Academic partnerships
Cross-disciplinary collaborations
Thank you for this forum to share
ideas!
Credits
Lynd Bacon (LBA Inc)
Anand Bodapati (UCLA)
Wagner Kamakura (Duke)
Jeffrey Kreulen (IBM Research)
Peter Lenk (Michigan)
David Madigan (Rutgers)
Alan Montgomery (CMU)