Transcript Experience-Based Access Management (EBAM)
Modeling and Detecting Anomalous Topic Access
Siddharth Gupta 1 , Casey Hanson 2 , Carl A Gunter 3 , Mario Frank 4 , David Liebovitz 4 , Bradley Malin 6 1,2,3,4 Department of Computer Science, 3,5 Department of Medicine, 6 Department of Biomedical Informatics 1,2,3 University of Illinois at Urbana-Champaign, 4 University of California, Berkeley, 5 Northwestern University, 6 Vanderbilt University
Outline of the talk
• • • • • •
Motivation and Challenges Our Contributions Dataset Description Random Topic Access (RTA) Model Random Topic Access Detection (RTAD) Model Evaluation and Results
EMR Access Breach
Reported on April 2013
• The University of Florida : 2 offenders illegitimately accessed 15,000 patients over 3 years (March 2009- October 2012).
• Personal information, including names, addresses, date of birth, medical record numbers and Social Security numbers were compromised for the purposes of billing fraud.
• One of the offender was the insider in the hospital without prior.
•
How can we efficiently model and detect these types of attacks in the healthcare system.
Motivation
• •
Two broad classes of threats:
Inside Threats: the behaviors of hospital users (staff) that adversely affects the healthcare institution, where they commit financial frauds, medical identity thefts and curiosity accesses to EMR.
• Outside Threats: an outsider entity hires an insider to commit fraud, a visitor accessing records on open computers in some scenarios, untrustable patient seeking information about other patient’s records.
• Ramifications: Irreversible violation of patient privacy and subsequent high cost for hospitals.
• Deterrent: The current legal deterrent is a number of legal regulations, such as the HIPAA and HITECH, which impose specific privacy rules for patients and financial penalties for violating them
Classical Detection Methodologies
• Build a classifier on labeled data to differentiate anomalous users from legitimate users.
•
Real healthcare data is not labeled.
• Current methods use injection of synthetic anomalous users and evaluate on them.
Random Object Access
• In Healthcare information systems the primary mechanism for generating anomalous users is to associate users with random patients in the dataset.
• We call such a system, ROA (random object access).
• The resulting user doesn’t appear to be a plausible attacker in the real hospital setting.
Our Contributions
• Random Topic Access (RTA): we introduce and study a random topic access model or RTA aimed at users whose access may be illegitimate but is not fully random because it is focused on common semantic themes. • User Simulation: we utilize the latent topic framework to simulate illegitimate users and model them as samples from a Dirichlet distribution over topic multinomials.
• Anomaly Detection Framework: study RTA to detect and evaluate the users having suspicious access patterns.
Data Set
Fig a)
Summary Statistics for Audit Logs
Fig b)
Summary Statistics for Patient Records
Random Topic Access (RTA) Model
• Random Topic Access (RTA) Model: a mechanism for utilizing latent topic structures to represent real users in the population and allow for the synthetic generation of semantically relevant anomalous users.
• Topic modeling can provide a concise description of how a user behaves in the context of his peers and the meaning of that behavior.
• Model users as samples from a Dirichlet distribution over topic multinomials.
Latent Dirichlet Allocation (LDA)
Patient 1
Diagnosis Raw Feature
𝑑 1 0 𝑑 1 2 𝑑 0 3 ...
𝑑 4500 1 LDA Patient
Diagnosis Topic Feature
Topic 1 Topic 2 Topic 3 1 0.2
0.1
0.70
Topic Distributions
Topics Distributions
Neoplasm Topic Obstetric Topic Kidney Topic Diagnosis Topics
Characterizing Users
1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 User and Accessed Patient Topic Distributions Topic 1 Patient 1: 100 times Topic 2 Topic ID Patient 2: 30 times Topic 3 User Number of Accesses 100 90 80 30 20 10 0 70 60 50 40 Patient 1 Patient 2
Multidimensional Scaling: Patient Diagnosis
•
RTA: Simulating Users
r ~ Dir( 𝛼 ) with n dimensions, where n is the number of topics .
a.) Directed or Masquerading User (α<1) : an anomalous user of some specialty gains sole access to the terminal of another user in the hospital.
b.) Purely Random User (α=1): user is characterized by completely random behavior, with little semantic congruence to the hospital setting c.) Indirect User: user type resembles an even blend of the topics of many specialized users
Population Distribution
A. Directed Users
α = 0.01
α = 0.1
B. Purely Random Users
α = 1
C. Indirected Users
α = 100
Role Distribution
Masquerading Users Purely Random Users Anomalous Users Real Users Indirect Users
NMH Resident Fellow CPOE
Random Topic Access Detection (RTAD)
• Random Topic Access Detection (RTAD): an anomaly detection framework that generates synthetic users using RTA and applies a standard spatial outlier, k nearest neighbor k-NN detection scheme for classification.
•
1.
2.
3.
4.
Methodology
LDA: define patient topics, and user typing to represent users in the topic space.
RTA user injection: generate three types of anomalous users and insert into each role at a 5% mix rate.
Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest spatial neighbors to the avg. pairwise distance among those neighbors is greater than a threshold, call the user anomalous.
Evaluation Metric: best Area Under the Curve (AUC) for each combination.
𝛼 , role
Results - I
The best AUC across all evaluated dimensions is plotted for each role performing poor for 𝛼 > 1 .
Results - II
The best AUC across all evaluated dimensions is plotted for each role performing well or near average for 𝛼 > 1 .