Introduction - Department of Computer Science • NJIT
Download
Report
Transcript Introduction - Department of Computer Science • NJIT
Machine Learning
Usman Roshan
Dept. of Computer Science
NJIT
What is Machine Learning?
• “Machine learning is programming computers
to optimize a performance criterion using
example data or past experience.” Intro to
Machine Learning, Alpaydin, 2010
• Examples:
– Facial recognition
– Digit recognition
– Molecular classification
A little history
• 1946: First computer called ENIAC to perform numerical
computations
• 1950: Alan Turing proposes the Turing test. Can machines
think?
• 1952: First game playing program for checkers by Arthur
Samuel at IBM. Knowledge based systems such as ELIZA
and MYCIN.
• 1957: Perceptron developed by Frank Roseblatt. Can be
combined to form a neural network.
• Early 1990’s: Statistical learning theory. Emphasize learning
from data instead of rule-based inference.
• Current status: Used widely in industry, combination of
various approaches but data-driven is prevalent.
Example up-close
• Problem: Recognize images representing digits
0 through 9
• Input: High dimensional vectors representing
images
• Output: 0 through 9 indicating the digit the
image represents
• Learning: Build a model from “training data”
• Predict “test data” with model
Data model
• We assume that the data is represented by a set of
vectors each of fixed dimensionality.
• Vector: a set of ordered numbers
• We may refer to each vector as a datapoint and each
dimension as a feature
• Example:
– A bank wishes to classify humans as risky or safe for loan
– Each human is a datapoint and represented by a vector
– Features may be age, income, mortage/rent, education,
family, current loans, and so on
Machine learning resources
• Data
– NIPS 2003 feature selection contest
– mldata.org
– UCI machine learning repository
• Contests
– Kaggle
• Software
– Python sci-kit
–R
– Your own code
Machine Learning techniques we will
learn in the course
Bayesian classification:
Univariate and multivariate
Linear regression
Maximum likelihood estimation
Naïve-Bayes
Feature selection
Dimensionality reduction:
PCA
Fisher discriminant
Maximum margin criterion
Clustering
Nearest neighbor
Perceptron and neural networks
Linear discrimination:
Logistic regression
Support vector machines
Kernel methods
Regularized risk minimization
Hidden Markov models
Decision trees and random forests (if
time permits)
Advanced topics (if time permits):
Boosting
Deep learning
Textbook
• Not required but highly recommended for beginners
• Introduction to Machine Learning by Ethem Alpaydin
(2nd edition, 2010, MIT Press). Written by computer
scientist and material is accessible with basic
probability and linear algebra background
• Applied predictive modeling by Kuhn and Johnson
(2013, Springer). More recent book focuses on
practical modeling.
Some practical techniques
• Combination of various methods
• Parameter tuning
– Error trade-off vs model complexity
• Data pre-processing
– Normalization
– Standardization
• Feature selection
– Discarding noisy features
Background
• Basic linear algebra and probability
– Vectors
– Dot products
– Eigenvector and eigenvalue
• See Appendix of textbook for probability
background
– Mean
– Variance
– Gaussian/Normal distribution
Assignments
• Implementation of basic classification
algorithms with Perl and Python
– Nearest Means
– Naïve Bayes
– K nearest neighbor
– Cross validation scripts
• Experiment with various algorithms on
assigned datasets
Project
• Some ideas:
– Experiment with Kaggle and NIPS 2003 feature
selection datasets
– Experimental performance study of various
machine learning techniques on a given dataset.
For example comparison of feature selection
methods with a fixed classifier.
Exams
• One exam in the mid semester
• Final exam
• What to expect on the exams:
– Basic conceptual understanding of machine
learning techniques
– Be able to apply techniques to simple datasets
– Basic runtime and memory requirements
– Simple modifications
Grade breakdown
• Assignments and project worth 50%
• Exams worth 50%