Transcript ARCS-Poster

Scholar: Andrew Emmott
Focus: Machine Learning
Advisors: Tom Dietterich,
Prasad Tadepalli
Donors: Leslie and Mark
Workman
ANOMALY
DETECTION
What is Anomaly Detection?
Anomaly Detection is the practice of identifying the
strangest data points (items, people, events, etc.) from a
large selection of normal ones.
Sometimes this is also called statistical outlier detection,
(where, given a large collection of data, we ask which
Anomaly
Normal
Normal
Normal
items are the least likely), but in our work we prefer the
more general idea of anomalies because most
applications are concerned with identifying some rare
class of events that are influenced by some external
process.
For example, consider a fluke football game where the underdog team
won by a very large score.
Methodologies Work
Anomaly detection is still a nascent field and there is a lack of consistency in how algorithms are
evaluated. The bulk of my work has been exploring the efficacy of various methodologies.
Different applications might have different characteristics and might allow for different algorithms to
succeed. We have urged for a more careful and precise quantification of the qualities of certain
anomaly detection.
In simpler terms, we try to quantify the following things about any anomaly detection domain:
• Point Difficulty – how difficult is it to distinguish the anomalies from the normal points?
• Insider threats want to blend in.
“Keep your distance. But don’t look like you’re trying to
keep your distance.”
• Jet engine failures do not.
This event was unlikely, but we might only consider it a true anomaly if
we learned that the favored team had thrown the match.
If it was an honest game, then it was still generated by the normal
football game process – it was simply an unlikely outcome. A thrown
match is a true anomaly because it is caused by forces outside normal
football.
A more serious example: Consider that a cyber criminal might go to
great lengths to make sure all of their actions appear normal.
They are a rare class of person, and are not following the same rules
that normal people follow, but they might not seem weird at first
glance.
• Relative frequency – How much of the data is anomalous?
• Signal failure might be relatively frequent.
• Insider threats might be very very rare.
• Semantic variation – How similar are the anomalies to each other?
• Are they a de facto class?
• Or are they simply not normal?
Thus, it is not easy to establish an all-purpose definition of anomalies.
FACE
NOT A FACE
You might be familiar with some applications of Artificial
Intelligence such as face recognition.
These problems are known as supervised learning
problems because the machine is supervised when it is
told which images contain faces and which don’t. It
simply finds rules that distinguish faces from not-faces.
• Feature relevance/irrelevance – How well do statistical outliers in the data map to the
application target?
• An 8 foot tall man is an outlier.
• But he is not an insider threat.
Anomaly detection is an unsupervised learning problem. We want to develop algorithms that can
distinguish normal from not-normal without being told which is which.
This is more difficult but also more practical; we are asking the machine to find things we are already
having trouble finding on our own!
What Are the Applications?
There are many real-world applications of anomaly detection. In general, whenever you might wish to
identify things that are not normal, anomaly detection might be able to help!
Some examples:
• Cyber Security & Insider Threat Detection
• Normal: Regular Employees
• Not Normal: Data Thief
• Machine Failure Prevention
• Normal: Functioning Jet Engine
• Not Normal: Failing Jet Engine
• (Also: Elevators, Hard Drives, etc.)
• Medical Prognosis
• Normal: Healthy Cells
• Not Normal: Cancer Cells
• Surveillance
• Normal: Bank teller is approached with a deposit slip.
• Not Normal: Bank teller is approached with a gun.
Algorithms Work
We have also worked on developing anomaly detection algorithms of
our own. A high level summary of the algorithms I have explored:
𝑥1
𝑝1
• Cross Prediction
If our data is described by some number of features, we can treat each of those
features like its own supervised learning problem.
𝑥4
𝑝4
𝑝2
In other words, find weirdness through cross examination.
𝑝3
Example: If two people buy five copies of Catcher in the Rye, that might seem
weird. If one of them owns a bookstore and the other is unemployed, maybe only
one of them seems strange.
𝑥3
𝑥2
Cross Prediction
• Repeated Impossible Discrimination Ensemble (RIDE)
Classification techniques can not only discriminate between classes, but many of them can provide a measure of confidence
in their decision. If we randomly separate the data into two indiscernible faux-classes, the machine should have low
confidence in most points.
But what of the points where the machines has high confidence, even on an “impossible” and meaningless classification
task? These points are more easily distinguished from all the rest, and therefore, not normal.
Acknowledgements:
Further Acknowledgements:
Funding for my research is provided by the U.S. Army Research Office (ARO) and Defense Advanced Research
Projects Agency (DARPA) under Contract Number W911NF-11-C-0088. The content of the information in this document
does not necessarily reflect the position or the policy of the Government, and no official endorsement should be
inferred. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes
notwithstanding any copyright notation here on.
Additional funding for my research is provided by an ARCS Scholar Award from the Portland Chapter of
the ARCS Foundation. Thanks!