Bank Of America BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills.
Download ReportTranscript Bank Of America BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills.
Bank Of America BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Business Problem Customer Strategy: grow base by forming lifelong banking relationships with young adults Current Account Demographics Report Shows ● fewer new student accounts ● increase in cancellation of accounts by the young adult demographic Impact: Losing market share to other banks Business Questions ● What is Bank of America’s reputation with this age group - do they like Bank of America or not? ● How does Bank of America compare to other banks? ● Are customers in this demographic group unhappy with the bank’s services? ● Are there any banking products which customers in this group want not offered by Bank of America? Source of Information Online social media sites are a good source for comments from this age group YouTube Statistics ● ● More than 1 billion unique users monthly ● Nielsen ratings show that YouTube reaches more US adults ages 18-34 than any other cable network http://www.youtube.com/yt/press/statistics.html Demographics of Reddit http://www.theatlantic.com/technology/archive/2013/07/reddit-demographics-in-onechart/277513/ What do People Think About Banks? Topic Reddit YouTube Twitter mortgage 5% 6% 30% loan 5% 13% 0% fraud 6% 7% 0% insurance 1% 2% 0% branch 3% 1% 0% hours 2% 1% 0% account 19% 16% 20% overdraft 8% 1% 0% bailout 1% 6% 0% fee 18% 11% 20% customer 13% 8% 0% representative / teller 7% 18% 20% [credit] union 10% 7% 10% computer 1% 1% 0% CEO 2% 2% 0% Data Gathering and Validation Use Python to obtain comments from web ● Crawling Reddit ● API for Twitter ● API for YouTube Data Cleansing and Exploration ● Delete incomplete comments, extra whitespace, and punctuation, stopwords ● Explore data using Python to analyze the frequency of words in the comments in order to identify “key words” related to banking ● Word scan confirmed the key words Gathering data from Twitter ● Technique: twitter API ● Amount of tweets: BOA -- 125KB Citibank-- 104 KB Chase -- 100 KB ● Timestamp: 1 week ● Type of Data: Tweet text Tweet created_at Geocode Data Processing ● Two libraries: positive & negative ● Score each tweet Tweets by Location Data Processing ● ● Summary for BOA tweets: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.20000 -0.04348 0.00000 -0.01176 0.02857 0.20000 ● Good or bad? Competitor Analysis Distribution for tweets’ score Mean: BOA: -0.01176 Citi bank: -0.0006146 Chase: -0.00731 Two Sample T-test Null hypothesis: true difference in means is equal to 0 Alpha=0.1 ● BOA and Citi bank: p-value = 0.0009004 < 0.1 ● Citi bank and Chase: p-value = 0.06971 < 0.1 ● BOA and Chase p-value = 0.2289 > 0.1 Gathering data from YouTube ● ● Techniques: BeautifulSoup g.data ● Amount for general analysis: 3097 Topic Reddit YouTube Twitter mortgage 5% 6% 30% loan 5% 13% 0% fraud 6% 7% 0% insurance 1% 2% 0% branch 3% 1% 0% hours 2% 1% 0% account 19% 16% 20% overdraft 8% 1% 0% bailout 1% 6% 0% fee 18% 11% 20% customer 13% 8% 0% representative / teller 7% 18% 20% [credit] union 10% 7% 10% computer 1% 1% 0% CEO 2% 2% 0% YouTube data for each category ● ● Training data: 600 ● Loan: 2430 ● Account: 2700 ● Service: 520 Naive Bayes Classification Algorithm A naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable。 Naive Bayes Classification Algorithm Splitting the dataset into training and test data (Manual rating of comments) ● Training (400) ● Testing (200) ● Predicting (5700) Primary Categories of Customer Complaints Accuracy of Classification ● ● Mortgage: 64.5% ● Accounts: 58.7% ● Service: 68.4% Mortgage Account Service Thank you!