Team #6 Bill Cheng Sabina Del Rosso Stephen Hom

Download Report

Transcript Team #6 Bill Cheng Sabina Del Rosso Stephen Hom

Social Networking
Analytics for Calbee
(SNAC)
Team #6
Bill Cheng
Sabina Del Rosso
Stephen Hom
Omede Firouz Stacy Hsueh
Wei Jiang
Thoranis Karnasuta
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
Professor Ken Goldberg. IEOR 115. December 9, 2011.
DATABASE
Client Background: Calbee San Francisco
• CALBEE, Inc. is one of the largest snack
companies in Japan
– Company based on the premise of good health
• Calbee, San Francisco is the company’s first
US-based flagship store
– Founded in early 2011
– Located in Westfield Mall
• Active in social media
– Website, Facebook, Twitter
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Image from calbeeshop.com
Current Infrastructure
• Currently do not keep track of social media
hits on any site
• Use Point of Sale for sales data and employee
clock-ins
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Image from http://www.unrealstudio.com
Database Objectives
• Handle future expansion into e-commerce
• Increase social media marketing in targeted
demographics
• View effect of promotions on sales and social
media to help better cater future promotions
• Provide a foundation to maximize profits
– Logistic management using integer programming
– Data mining and machine learning to predict sales
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
EER Diagram
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Relational Design Schema
Relational Design Schema (46 relations)
Promotion/Sales/Retail: Relations Numbered 0-9
1. PRODUCT(ProdID, Name, IsSour, IsSweet, IsSalty, IsSavory, ManufCost, RetailPrice)
2. PURCHASE(PurchaseID, ProdID1, PromoID3a, CustID6a, StoreID4a, EmpID5a, Timestamp, ipAddress)
3a. PROMOTION(PromoID, PromoCode, StoreID, StartDate, EndDate, Discount)
3b. PROMOTION_SPREAD_VIA_TWITTER(PromoID3a, TweetID10c)
3c. PROMOTION_SPREAD_VIA_F(PromoID3a, F_CID11c)
3d. PROMOTION_SPREAD_VIA_G+(PromoID3a, G_CID12c)
3e. PROMOTION_SPREAD_VIA_S(PromoID3a, S_DID13c)
3f. PROMOTION_SPREAD_VIA_B(PromoID3a, BPost_ID14a)
3g. PROMOTION_INFO_VIA_W(PromoID3a, url15)
4a. STORE(StoreID, AddressNo,StreetName, City, Country, ZipCode, PhoneNo)
4b. STORE_CARRIES(StoreID4a, ProdID1, Stock)
5a. EMPLOYEE(EmpID, LName, FName, Position, FavProdID1, StoreID4, AddressNo,StreetName, City, State, Country, ZipCode, SSN)
5b. EMPLOYEE_IS_FRIEND(EmpID5a, T_UID10a, F_UID11a, G_UID12a, S_UID13a)
5c. EMPLOYEE_IS_CUSTOMER(EmpID5a, CustID6a)
6a. CUSTOMER(CustID, LName, FName, AddressNo, StreetName, City, State, Country, ZipCode, FavProd1, BirthDate)
6b. CUSTOMER_IS_FRIEND(CustID6a, T_UID10a, F_UID11a, G_UID12a, S_UID13a)
8a. PRODUCT_AD(P_Ad_ID, ProductID1, DateBeginAd, DateEndAd, F_or_G_Ad)
8b. STORE_AD(S_Ad_ID, Store_ID, DateBeginAd, DateEndAd, F_or_G_Ad)
8c. F_P_AD_CLICKED(P_Ad, F_UID, Timestamp, ipAddress)
8d. G_P_AD_CLICKED(P_Ad_ID, G_UID, Timestamp, ipAddress)
8e. F_S_AD_CLICKED(S_Ad_ID, F_UID, Timestamp, ipAddress)
8f. G_S_AD_CLICKED(S_Ad_ID, G_UID, Timestamp, ipAddress)
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Relational Design Schema Cont.
Social Media: Relations Numbered 10-19
10a. T_USER(T_UID, T_Username, Fname, Lname, City, State, BirthDate, Email)
10b. T_FOLLOWING(T_UID10a, Follower_T_UID10a, DateBeganFollowing)
10c. TWEET(TweetID, T_UID10a, Auth_T_UID10a, TextStr, Timestamp)
11a. F_USER(F_UID, Fname, Lname, City, State, BirthDate, Email)
11b. F_FRIENDS(F_UID11a, Friend_F_UID11a, DateBecameFriends)
11c. F_COMMENT(F_CID, Auth_F_UID11a, On_F_CID11c, TextStr, Timestamp)
11d. F_LIKE(F_CID11c, F_UID11a, Timestamp)
12a. G_USER(G_UID, Fname, Lname, City, State, BirthDate, Email)
12b. G_FRIENDS(G_UID12a, Friend_G_UID12a, DateBecameFriends)
12c. G_COMMENT(G_CID, Auth_G_UID12a, On_G_CID12c, TextStr, Timestamp)
12d. G_LIKE(G_CID12c, G_UID12a, Timestamp)
13a. S_USER(S_UID, Fname, Lname, City, State, BirthDate, Email)
13b. S_FOLLOWING(S_UID13a, Follower_S_UID13a, DateBeganFollowing)
13c. S_DISCOVERY(S_DID, S_UID13a, url, Timestamp)
13d. S_REVIEW(S_DID13c, S_UID13a, TextStr, Like/Dislike, Timestamp)
14a. BLOG_POST(url, BPost_ID, Author_Emp_ID5a, TextStr, Timestamp)
14b. BLOG_COMMENT(BComment_ID, url, BPost_ID14a, TextStr, Timestamp, ipAddress)
14c. ASSOCIATE_IP_T(T_UID10a, Timestamp, ipAddress)
14d. ASSOCIATE_IP_FB(F_UID11a, Timestamp, ipAddress)
14e. ASSOCIATE_IP_G(G_UID12a, Timestamp, ipAddress)
14f. ASSOCIATE_IP_S(S_UID13a, Timestamp, ipAddress)
15. MAIN_WEBSITE(url, link_to_html_file, Timestamp)
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Relational Design Schema Cont.
Other Data: Relations Numbered 20-29
20a. GOOGLE_TREND(GT_ID, word, city, country, day, hits)
20b. RELATED_TREND(word, Related_Prod_ID1)
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Access Table Relationships
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Access Table Relationships Cont.
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Normalization Analysis: 1NF
Removal of a multi-valued attribute (flavor):
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Normalization Analysis: 2NF
Removal of a partial FD:
{PromoID}  {PromoCode, StoreID, StartDate, EndDate, Discount}
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Normalization Analysis: 3NF
Removal of a transitive FD:
{T_UID}  {T_Username, Fname, Lname, City, State, BirthDate, Email}
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Normalization Analysis: BCNF
Removal of a FD with a non-superkey attribute on the LHS:
{PromoCode}  {StartDate, EndDate, Discount}
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 1: Popular Product Stock
Find out the most talked about products in a
city and their quantities (stock). This will help us
determine which products to move around to
balance inventories in expectation of sale
increases. Data can be exported to a solver to
do a shipment problem.
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 1: SQL
SELECT Product.ProdID, Product.ProdName, (SELECT
COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr
LIKE '*' + Product.ProdName + '*') AS Hits, Store.City, Store_Carries.StoreID
AS Store, Store_Carries.Stock AS Stock
FROM Product, Store, Store_Carries
WHERE (((Product.ProdID)=[Store_Carries].[ProdID]) AND
((Store.StoreID)=[Store_Carries].[StoreID]))
ORDER BY Product.ProdName;
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 1: Output
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 1: Data Analysis
• We have a list of stores and their stock of
different products
– Transportation problem to encourage similar
levels of stock
– Minimize shipments, shipping costs, etc.
• Subject to: No outliers (stores with low stock)
Possible shipment constraints
Possible traffic constraints
Etc.
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 1: Data Analysis (AMPL)
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 1: Data Analysis
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 2: Promo Social Networking
Consider a promotion. Compare product social
network comments in a given city two weeks
before, during, and two week after a promotion
to judge its effectiveness.
Order by the return on the investment.
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 2: SQL
SELECT Promotion.PromoID, (SELECT COUNT(*)
FROM F_Comment, Product
WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'
AND Product.ProdID = Promotion.ProdID
AND F_Comment.Timestamp < Promotion.StartDate
AND F_Comment.Timestamp > Promotion.StartDate - 14) AS HitsBefore, (SELECT COUNT(*)
FROM F_Comment, Product
WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'
AND Product.ProdID = Promotion.ProdID
AND F_Comment.Timestamp < Promotion.EndDate
AND F_Comment.Timestamp > Promotion.StartDate) AS HitsDuring, (SELECT COUNT(*)
FROM F_Comment, Product
WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'
AND Product.ProdID = Promotion.ProdID
AND F_Comment.Timestamp < Promotion.EndDate + 14
AND F_Comment.Timestamp > Promotion.EndDate) AS HitsAfter, (SELECT
SUM(Promotion.Discount*Product.RetailPrice)
FROM Product) AS PromoCost
FROM Promotion
ORDER BY PromoCost;
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 3: Priority Customers
Use friendship data to rate friends by how many
recommendations they have made.
Determine how many of a person's friends
became friends with us after they became friends
with us.
In this way, we identify possible priority customers
of Calbee to target for special advertisements and
promotions.
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 3: SQL
SELECT F.F_UID, ( SELECT COUNT(*)
FROM F_Friends AS F2
WHERE F2.F_UID = F.F_UID
AND EXISTS( SELECT F3.DateBecameFriends
FROM F_FRIENDS F3
WHERE F3.Friend_F_UID = 1
AND F3.F_UID = F2.Friend_F_UID
AND F3.DateBecameFriends > F.DateBecameFriends)) AS friendCount
FROM F_Friends AS F
WHERE F.Friend_F_UID = 1;
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 3: Data Analysis
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 4: Google Trends and Stocks
Determine priority stores which don’t stock
products that they should, as determined by
google trend word popularity.
For a given google trend word, find the top 5
cities in which the word is most searched in
year 2011. Then, find stores in those cities and
which related products they do not stock. This
will help us identify how to improve inventory.
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 4: SQL
SELECT Store.StoreID AS Store, Store.City AS City, Product.ProdID AS
Prod
FROM Store, Store_Carries, Product
WHERE Store.City IN (SELECT TOP 5 Google_Trend.City
FROM Google_Trend
WHERE Google_Trend.Word = 'test')
AND Store_Carries.StoreID = Store.StoreID
AND Store_Carries.ProdID = Product.ProdID
AND Store_Carries.Stock = 0
AND Product.ProdID IN (SELECT Related_Trend.Related_Prod_ID
FROM Related_Trend
WHERE Related_Trend.Word = 'test');
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 5: Social Network and Purchases
Gather Social Networking, Google Trend, and
Purchase data over time to formulate predictive
models.
For a given product, find the number of social
network hits of a product, the related trend word
hits, and the number of purchases in that product
for a given city on a given day. In this way, we can
use social network 'buzz' and trend data to predict
purchases as a function of time and city. Order by
product then timestamp.
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 5: SQL
SELECT Product.ProdID, Product.ProdName, Purchase.Timestamp,
(SELECT COUNT(F_Comment.F_CID)
FROM F_Comment
WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'
AND F_Comment.Timestamp = Purchase.Timestamp) AS
SocialNetworkHits, (SELECT SUM(Google_Trend.hits)
FROM Google_Trend
WHERE Google_Trend.word = Product.ProdName
AND Google_Trend.Timestamp = Purchase.Timestamp) AS TrendHits
FROM Product, Purchase
WHERE Purchase.ProdID = Product.ProdID
ORDER BY Purchase.ProdID, Timestamp;
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 5: Output
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 5: Data Analysis
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 5: Data Analysis
• Social media networking, Google Trends, and
Purchases data used predictively
– Group into weekly vectors
– Extract significant data using Principle Component
Analysis to project onto 2 dimensions.
– Cluster data using K-Means
 See if we can predict future sales using machine
learning
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Query 5: Data Analysis
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Login Interface
Employees login here:
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Switchboard
Allows employees to insert data in forms or run
selected query
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Forms: New Employee
Enter information on new Calbee employees
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE
Questions?
Thank you!
CLIENT
EER/SCHEMA
NORMALIZATION
QUERIES
DATABASE