Bank Of America BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills.

Download Report

Transcript Bank Of America BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills.

Bank Of America
BIA 660 Web Analytics - Midterm
Akshta Chougule
Hao Han
Di Huo
Xi Lu
Laura Sills
Business Problem
Customer Strategy: grow base by forming lifelong banking relationships with young adults
Current Account Demographics Report Shows
● fewer new student accounts
● increase in cancellation of accounts by
the young adult demographic
Impact: Losing market share to other banks
Business Questions
● What is Bank of America’s reputation with
this age group - do they like Bank of America
or not?
● How does Bank of America compare to other
banks?
● Are customers in this demographic group
unhappy with the bank’s services?
● Are there any banking products which
customers in this group want not offered by
Bank of America?
Source of Information
Online social media sites are a good source for
comments from this age group
YouTube Statistics
●
●
More than 1 billion unique users monthly
● Nielsen ratings show that YouTube reaches
more US adults ages 18-34 than any other
cable network
http://www.youtube.com/yt/press/statistics.html
Demographics of Reddit
http://www.theatlantic.com/technology/archive/2013/07/reddit-demographics-in-onechart/277513/
What do People Think About
Banks?
Topic
Reddit
YouTube
Twitter
mortgage
5%
6%
30%
loan
5%
13%
0%
fraud
6%
7%
0%
insurance
1%
2%
0%
branch
3%
1%
0%
hours
2%
1%
0%
account
19%
16%
20%
overdraft
8%
1%
0%
bailout
1%
6%
0%
fee
18%
11%
20%
customer
13%
8%
0%
representative / teller
7%
18%
20%
[credit] union
10%
7%
10%
computer
1%
1%
0%
CEO
2%
2%
0%
Data Gathering and Validation
Use Python to obtain comments from web
● Crawling Reddit
● API for Twitter
● API for YouTube
Data Cleansing and Exploration
● Delete incomplete comments, extra
whitespace, and punctuation, stopwords
● Explore data using Python to analyze the
frequency of words in the comments in order
to identify “key words” related to banking
● Word scan confirmed the key words
Gathering data from Twitter
● Technique: twitter API
● Amount of tweets:
BOA -- 125KB
Citibank-- 104 KB
Chase -- 100 KB
● Timestamp: 1 week
● Type of Data:
Tweet text
Tweet created_at
Geocode
Data Processing
● Two libraries: positive & negative
● Score each tweet
Tweets by Location
Data Processing
●
● Summary for BOA tweets:
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
-0.20000
-0.04348
0.00000
-0.01176
0.02857
0.20000
● Good or bad?
Competitor Analysis
Distribution for tweets’ score
Mean:
BOA: -0.01176
Citi bank: -0.0006146
Chase: -0.00731
Two Sample T-test
Null hypothesis: true difference in means is equal to 0
Alpha=0.1
● BOA and Citi bank:
p-value = 0.0009004 < 0.1
● Citi bank and Chase:
p-value = 0.06971 < 0.1
● BOA and Chase
p-value = 0.2289 > 0.1
Gathering data from YouTube
●
● Techniques: BeautifulSoup
g.data
● Amount for general analysis: 3097
Topic
Reddit
YouTube
Twitter
mortgage
5%
6%
30%
loan
5%
13%
0%
fraud
6%
7%
0%
insurance
1%
2%
0%
branch
3%
1%
0%
hours
2%
1%
0%
account
19%
16%
20%
overdraft
8%
1%
0%
bailout
1%
6%
0%
fee
18%
11%
20%
customer
13%
8%
0%
representative / teller
7%
18%
20%
[credit] union
10%
7%
10%
computer
1%
1%
0%
CEO
2%
2%
0%
YouTube data for each category
●
● Training data: 600
● Loan: 2430
● Account: 2700
● Service: 520
Naive Bayes Classification
Algorithm
A naive Bayes classifier assumes that the presence or absence of a
particular feature is unrelated to the presence or absence of any other
feature, given the class variable。
Naive Bayes Classification
Algorithm
Splitting the dataset into training and test data
(Manual rating of comments)
● Training (400)
● Testing (200)
● Predicting (5700)
Primary Categories of Customer
Complaints
Accuracy of Classification
●
● Mortgage: 64.5%
● Accounts: 58.7%
● Service: 68.4%
Mortgage
Account
Service
Thank you!