Transcript Review3

Questions and Topics Review Dec. 10, 2013
1. Compare AGNES /Hierarchical clustering with K-means; what are the main differences?
2. K-means has a runtime complexity of O(t*k*n*d), where t is the number of iterations, d is the
dimensionality of the datasets, k is the number of clusters in the dataset, and n is the number of
objects in the dataset. Explain! In general, is K-means an efficient clustering algorithm; give a
reason for this answer, by discussing its runtime by referring to its runtime complexity formula! [5]
The number of attributes an object has!
3. Assume the Apriori-style sequence mining algorithm described at pages 429-435 is used and
the algorithm generated 3-sequences listed below (see 2007 Final Exam!):
Frequent 3-sequences
<(1) (2) (3)>
<(1 2 3)>
<(1) (2) (4)>
<(1) (3) (4)>
<(1 2) (3)>
<(2 3) (4)>
<(2) (3) (4)>
<(3) (4 5)>
Candidate Generation
Candidates that survived pruning
Christoph F. Eick
Answers Question 1
a.
a. AGNES creates set of clustering/a dendrogram; K-Means creates a single clustering
b. K-means forms cluster by using an iteration procedure which minimizes an objective functions,
AGNES forms the dendrogram by merging the closest 2 clusters until a single cluster is obtained
c. …
0.2
0.15
0.1
0.05
0
1
3
2
5
4
6
Christoph F. Eick
Answers Questions 2&3
3) Association Rule and Sequence Mining [15]
a) Assume the Apriori-style sequence mining algorithm described at pages 429-435 is
used and the algorithm generated 3-sequences listed below:
Frequent 3-sequences Candidate Generation Candidates that survived pruning
<(1) (2) (3)>
Candidate Generation:
Candidates that
<(1) (2) (3) (4)>  survived
survived pruning:
<(1 2 3)>
<(1 2 3) (4)>  pruned, (1 3) (4) is infrequent
<(1) (2) (3) (4)>
<(1) (2) (4)>
<(1) (3) (4 5)> pruned (1) (4 5) is infrequent
<(1) (3) (4)>
<(1 2) (3) (4)> pruned, (1 2) (4) is infrequent
<(1 2) (3)>
What if the ans are correct, but this part of
<(2 3) (4 5)> pruned, (2) (4 5) is infrequent
description isn’t giving?? Do I need to take
<(2 3) (4)>
any points off? ? Give an extra point if
<(2) (3) (4 5)>pruned, (2) (4 5) is infrequent
<(2) (3) (4)>
<(3) (4 5)>
explanation is correct and present;
otherwise subtract a point; more
than 2 errors: 2 points or less!
What candidate 4-sequences are generated from this 3-sequence set? Which of the
generated 4-sequences survive the pruning step? Use format of Figure 7.6 in the textbook
on page 435 to describe your answer! [7]
Answer Question 2:
t: #iteration k: number of clusters n: #objects-to-be-clustered d:#attributes
In each iteration, all the n points are compared to k centroids to assign them to nearest
centroid, which is O(k*n), each distance computations complexity is O(d). Therefore,
Christoph F. Eick
O(t*k*n*d).
Questions and Topics Review Dec. 10, 2013
4. Gaussian Kernel Density Estimation and DENCLUE
a. Assume we have a 2D dataset X containing 4 objects : X={(1,0), (0,1), (1,2) (3,4)}; moreover, we use the
Gaussian kernel density function to measure the density of X. Assume we want to compute the density at
point (1,1) and you can also assume h=1 (=1) and that we use Manhattan distance as the distance function!.
Give a sketch how the Gaussian Kernel Density Estimation approach determines the density for point (1, 1).
Be specific!
b. What is a density attractor?. How does DENCLUE form clusters.?
5) PageRank [8]
a) What does the PageRank compute? What are the challenges in using the PageRank algorithm in practice? [3]
b) Give the equation system that PAGERANK would use for the webpage structure given below. Give a sketch of
an approach that determines the page rank of the 4 pages from this equation system! [5]
P1
P2
P3
P4
Christoph F. Eick
Answer Question4
4. Gaussian Kernel Density Estimation and DENCLUE
a. Assume we have a 2D dataset X containing 4 objects : X={(1,0), (0,1), (1,2) (3,4)}; moreover, we use the
Gaussian kernel density function to measure the density of X. Assume we want to compute the density at
point (1,1) and you can also assume h=1 (=1) and that we use Manhattan distance as the distance function!.
Give a sketch how the Gaussian Kernel Density Estimation approach determines the density for point (1, 1).
Be specific!
b. What is a density attractor?. How does DENCLUE form clusters.?
a. The density of (1,1) is computed as follows:
fX((1,1))= e-1/2 + e-1/2 + e-1/2 + e-25/2
b. A density attractor is a local maximum of a density function. DENCLUE iterates over
the objects in the dataset and uses hill climbing to associate each point with a density
attractor. Next, if forms clusters such that each cluster contains objects in the dataset
that are associated with the same clusters; objects who belong to a cluster whose
density (of its attractor) is below a user defined threshold are considered as outliers.
f Gaussian ( x , y )  e
d ( x , y )2

2 2
f
D
Gaussian
( x )   i 1 e
N

d ( x , xi ) 2
2 2
Christoph F. Eick
Answers Questions 5 and 6
5a) What does the PageRank compute? What are the challenges in using the PageRank
algorithm in practice? [3]
It computes the probability of a webpage to be assessed. [1]
As there are a lot of webpage and links finding an efficient scalable algorithm is a
major challenge [2]
5b) Give the equation system that PAGERANK would use for the webpage structure
given below. Give a sketch of an approach that determines the page rank of the 4 pages
from this equation system! [5]
PR(P1)= (1-d) + d * (PR(P3)/2 + PR(P4)/3)
PR(P2)= (1-d) + d * (PR(P3)/2 + PR(P4)/3 + PR(P1))
PR(P3)= (1-d) + d*PR(P4)/3
PR(P4)=1-d
[One solution: Initial all page ranks with 1 [0.5] and then update the PageRank of each
page using the above 4 equations until there is some convergence[1].
6) A Delaunay triangulation for a set P of points in a plane is a triangulation DT(P)
such that no point in P is inside the circumcircle of any triangle in DT(P).
Christoph F. Eick
Questions and Topics Review Dec. 10, 2013
6.
7.
a)
What is a Delaunay triangulation?
SVM
The soft margin support vector machine solves the following optimization problem:
What does the second term minimize? Depict all non-zero i in the figure below! What is the advantage of the sof
margin approach over the linear SVM approach? [5]
b) Referring to the figure above, explain how examples are classified by SVMs! What is the relationship between
i and example i being classified correctly? [4]
Christoph F. Eick
Answer Question 7
a. Minimizes the error which is measured as the distance to the class’ hyperplane
for points that are on the wrong side of the hyperplane [1.5]Depict [2]; distances
to wrong hyperplane at most 1 point]. Can deal with classification problems in
which the examples are not linearly separable[1.5].
b.The middle hyperplane is used to classify the examples[1.5]. If i less equal to
half of the width of the hyperplane the example is classified correctly.
The length of the arrow for point i
is the value of i; for points i
without arrow i=0.
Christoph F. Eick