Mining Uncertain Data with Probabilistic Guarantees

Download Report

Transcript Mining Uncertain Data with Probabilistic Guarantees

MINING UNCERTAIN DATA WITH
PROBABILISTIC GUARANTEES
Liwen Sun,Reynold Cheng, DavidW.Cheung,
Jiefeng Cheng
SIGKDD 2010
1
Outline
Motivation
 Problem Definition
 Method

P-Apriori Algorithm
 TODIS Algorithm
 Probabilistic Association Rules

Experimental Result
 Conclusion

2
Motivation
The goals of this paper are:
 (1) propose a definition of frequent patterns and
association rules for the tuple uncertainty model.
 (2) develop efficient algorithms for mining
patterns and rules that are correct under PWS.

3
The Possible World
P(W2)=T1.p*(1-T2.p)*(1-T3.p)*T4.p=0.6*0.5*0.3*1=0.09
P(W4)=(1-T1.p)*(1-T2.p)*T3.p*T4.p=0.4*0.5*0.7*1=0.
4
Probabilistic Frequent Patterns(P-FP)
X:pattern
 sup(X):support count of pattern X
 sup({a})=1→0.29

5
Probabilistic Frequent Patterns(P-FP)

6
Probabilistic Association Rule(P-AR)

7
Pruning Infrequent Patterns

8
Dynamic-Programming Algorithm
9
Divide-and-Conquer Algorithm
10
Inverted Probability List

Lx:Inverted Probability List
11
TODIS Algorithm

Phase 1: Generate candidate patterns.

Phase 2: Top-down support inheritance.
12
Computing Association Rule Probability

13
Computing Association Rule Probability
14
Deriving Association Rules

15
Experimental Result
16
Conclusion

We studied efficient algorithms for extracting
frequent patterns from probabilistic databases.
The TODIS algorithm, when used together with
DC, yields the best performance.
17