Introduction to Data Mining Dr. Hany Saleeb Why Data Mining? — Potential Applications Direct Marketing identify which prospects should be included in a.
Download
Report
Transcript Introduction to Data Mining Dr. Hany Saleeb Why Data Mining? — Potential Applications Direct Marketing identify which prospects should be included in a.
Introduction to
Data Mining
Dr. Hany Saleeb
Why Data Mining? —
Potential Applications
Direct Marketing
identify which prospects should be included in a mailing list
Market segmentation
identify common characteristics of customers who buy same products
Market Basket Analysis
Identify what products are likely to be bought together
Insurance Claims Analysis
discover patterns of fraudulent transactions
compare current transactions against those patterns
What Is Data Mining?
Combination of AI and statistical analysis to discover
information that is “hidden” in the data
associations (e.g. linking purchase of pizza with beer)
sequences (e.g. tying events together: marriage and purchase of
furniture)
classifications (e.g. recognizing patterns such as the attributes of
employees that are most likely to quit)
forecasting (e.g. predicting buying habits of customers based on
past patterns) Expert systems or small ML/statistical programs
What can data mining do?
Classification
– Classify credit applicants as low, medium, high risk
– Classify insurance claims as normal, suspicious
Estimation
– Estimate the probability of a direct mailing response
– Estimate the lifetime value of a customer
Prediction
– Predict which customers will leave within six months
– Predict the size of the balance that will be transferred by a
credit card prospect
What can data mining do?
(cont’d)
Association
– Find out items customers are likely to buy together
– Find out what books to recommend to Amazon.com users
Clustering
– Difference from classification: classes are unknown!
Market Analysis and
Management
Where are the data sources for analysis?
Credit card transactions, loyalty cards, discount coupons,
customer complaint calls, plus (public) lifestyle studies
Target marketing
Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
Determine customer purchasing patterns over time
Conversion of single to a joint bank account: marriage, etc.
Cross-market analysis
Associations/co-relations between product sales
Prediction based on the association information
Data Mining: Confluence of
Multiple Disciplines
Database
Technology
Machine
Learning
Information
Science
Statistics
Data Mining
Visualization
Other
Disciplines
Data Mining: On What
Kind of Data?
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
Data Mining Process
Learning
Collecting relevant data
Model building
Understanding of business
Problem identification
Business strategy
and evaluation
Action
Requirements/challenges
in Data Mining
User interface
Mining methodology
Performance
Data source
Social and Security
Requirements/challenges
in Data Mining(2)
User interface
- Data Visualization
Understandability and interpretation of results
Information representation and rendering
Screen real-estate
- Interactivity
Manipulation of mined knowledge
focus and refine mining tasks
Focus and refine mining results
Requirements/challenges
in Data Mining(3)
Mining Methodology
Mining different kinds of knowledge in databases
Interactive mining of knowledge at multiple levels
of abstraction
Incorporation of background knowledge
Query languages
Expression and visualization of results
Handling noise and incomplete data
Pattern evaluation
Requirements/challenges
in Data Mining (4)
Performance
Efficiency and scalability of data mining algorithms
Linear algorithms needed
Parallel and distributed methods
Incremental methods
Divide and conquer?
Requirements/challenges
in Data Mining(5)
Data Source
Diversity of data types
Handling complex types of data
Mining information from heterogenous data
bases or information repositories
Can we expect a DM algorithm to do well on all
types of data ?
Data glut
Are we collecting the right data for the right answer?
Distinguish between important and unimportant data
Requirements/challenges
in Data Mining(6)
Social and Security
-Social Impact
Private and sensitive data is gathered and mined
without individual’s knowledge and/or consent
Appropriate use and distribution of discovered
knowledge
- Regulations
Need for privacy and DM policies
Data Mining Tools
Summary
The benefits of knowing one’s business is
critical; technologies are coming together
to support data mining.
Data mining is the process and result of
knowledge production, knowledge
discovery and knowledge management.