Transcript Special Topics in Data Mining Applications Focus on: Text Mining INFS-795
Special Topics in Data Mining Applications Focus on: Text Mining
INFS-795 Spring 2005 -- GMU
General Info
• Instructor: Carlotta Domeniconi – Office: S&T2, Rm 449 – Email: [email protected]
– Phone: (703) 993-1697 • • Office hours: Tue 4-6pm, or by appointment http://www.ise.gmu.edu/~carlotta/ • Visit the class webpage often!
Course Format
• Lectures by the instructor; • One midterm; • Paper presentations by students; • One project: – Project proposal; – Project presentation – Project paper;
Important Dates
•
March 10
: Project proposal due; •
March 24
: Midterm Exam; •
March 31
: Students’ presentations start; •
May 12
: Paper on the project due.
Visit the class webpage often !!!
The final grade is based on…
• Midterm:
25%
• Paper presentation:
15%
• Project (proposal, presentation, paper):
50%
•
Participation in class
and quizzes on papers presented:
10%
Course Overview
•
Classification
: – Bayes decision theory – Density estimation; Discriminant analysis – Decision trees; Nearest neighbors – Curse of dimensionality – Dimensionality reduction: • Principal Component Analysis (PCA) • Linear Discriminant Analysis (LDA) – Support Vector Machines
Course Overview
•
Clustering
: – Basics – Distance measures – K-means – Subspace clustering
Course Overview
•
Text categorization
: – Document representation; – Latent semantic indexing; – Unsupervised and supervised feature selection; – Feature weighting; – Similarity measures; – Semantic distances; – Kernel methods; – Detecting Spam email.
Course Overview
• Presentation/Discussion of papers – list of papers provided; • Project proposals; • Project presentations; • Paper on the project;
We will study and learn…
• Fundamental principles and techniques in data mining / machine learning; • Problems that arise in – Document classification • Existing approaches in data mining to address these problems; • Their limitations; • Can we do better?
Some useful books
• On Pattern Classification: – R. O. Duda, P. E. Hart, D. G. Stork, “ Pattern Classification ”, Second Edition, Wiley, 2001.
• On Document Classification: – S. Chakrabarti, “
Mining the Web: Discovering Knowledge from Hypertext Data
”, Elsevier Science, 2003.
– Thorsten Joachims, “
Learning to Classify Text using Support Vector Machines
”, Kluwer 2002.
• On Text Retrieval: – M. Berry and M. Browne, “
Understanding Search Engines. Mathematical Modeling and Text Retrieval”,
SIAM, 1999.
• On Statistical Learning: – T. Hastie, R. Tibshirani, and J. Friedman, “
The Elements of Statistical Learning. Data Mining, Inference and Prediction
”, Springer, 2001. (Last Print!)