Big Data Intelligence - OWASP Appsec USA 2013

Transcript Big Data Intelligence - OWASP Appsec USA 2013

Big Data Intelligence
Harnessing Petabytes of WAF statistics to Analyze & Improve Web Protection in the Cloud
Ory Segal, Tsvika Klein
Akamai Technologies
About Us
• Ory Segal
– Principal Product Architect, Cloud Security
• Tsvika Klein
– Product Manager, Cloud Security
Hosted by OWASP & the NYC Chapter
Topics to Cover
Akamai & OWASP ModSecurity CRS Relationship
Security Big Data @ Akamai
Measuring WAF Accuracy @ Akamai
CRS through the Big Data Prism (Lessons Learned)
Hosted by OWASP & the NYC Chapter
About Us
But we only have 45 minutes…
And too much data to cover…
Akamai & OWASP CRS
This is not an Akamai marketing presentation
Akamai has been offering its cloud-based WAF since 2009.
Kona Site Defender:
–
–
–
–
–
OWASP CRS (Akamai Kona Rules)
DDoS Protection
DNS Protection
Bot Detection
Site Shield / Site Cloaking
OWASP CRS was ported to Akamai MD, and does not run
directly on ModSecurity
SECURITY BIG DATA @ AKAMAI
Akamai Intelligent Platform
Akamai’s cloud platform enables secure,
high-performing user experiences on any device, anywhere
120,000+
Servers
2,000+
Locations
750+
Cities
Highlights:
 100 million page views per second and
500 billion hits per day
 734 Million IP addresses seen quarterly
 260+ Terabytes of compressed daily logs
 30% of all internet traffic
82
Countries
1,100+
Networks
CSI Platform Statistics
10 Terabytes of daily attack data
2 Petabytes of security data stored
45 days retention
140K concurrent connections (incoming data)
600K log lines / sec. indexed by 30 dimensions
8000 queries daily scanning terabytes of data
CSI High Level Architecture
FE Applications
AKAMAI EDGE SERVERS
BE Applications
YODA
LOG AGENT
HBASE
HADOOP
YODA ADAPTER
Yoda (Distributed Query Engine)
Interactive
Multiple data streams
Intuitive query language
High cardinality aggregation
Security Big Data
Challenge #1
Security Big Data
Challenge #2
Sample Data App - SARA
Interactive Tool to Analyze Security Events
BACK TO WAF & OWASP CRS…
WAF Accuracy Lingo
• Imagine a WAF that protects against 100% of all
possible attack vectors
…by blocking 100% of all HTTP requests
• Accurate WAF testing requires you to measure:
•
•
•
•
How many real attacks got blocked (TP)
How much valid requests were allowed through (TN)
How much valid traffic was inappropriately blocked (FP)
How many attacks were allowed through ((FN)
Lets talk about measuring Precision, Recall, Accuracy, MCC…
Things You Need to Know
Precision =
Recall =
tp
tp + fn
Accuracy =
MCC =
tp
tp + fp
tp + tn
tp + tn + fp + fn
tp× tn
(tp + fp)(tp + fn)(tn + fp)(tn + fn)
* MCC: http://en.wikipedia.org/wiki/Matthews_correlation_coefficient
% of blocked requests that were actual
attacks
% of attacks that were actually blocked
% of decisions that were good decisions
Correlation between WAF decisions
and actual nature of requests
Lets Look at Some Examples
WAF Type Requests
Valid
Attacks Blocked
TP
TN
FP
FN
P
R
A
MCC
Real
1000
990
10
11
8
987
3
2
0.73
0.8
0.995
0.76
Off
1000
990
10
0
0
990
0
10
N/A
0
0.99
0
Always
Block
1000
990
10
1000
10
0
990
0
0.01
1
0.01
0
Noisy
1000
990
10
31
8
967
23
2
0.26
0.8
0.975
0.45
Conservative
1000
990
10
2
2
990
0
8
1.00
0.2
0.992
0.45
A WAF’s accuracy needs to be measured both in its ability to block
attacks, as well as it’s ability to allow good traffic through…
Introducing:
Akamai WAF Testing
Framework
Akamai WAF Testing (AWT)
Framework
• Ability to send both valid & attack traffic
• Easily create or add new test cases:
• 3 methods: Text files, Burp Extender, Wireshark .pcaps
• Easily import test cases from Akamai’s Big Data platform
• Configurable and can work with any WAF
• Easily define success / fail criteria
• Intuitive XML & HTML reports
• Easy debugging of FP/FN w/ Anomaly Scoring (rule
comb.)
AWT Built-In Test Cases
In order to accurately assess WAF, we collected
test cases from the following sources:
Web interaction recordings of Alexa Top 100
internet sites – Commerce, Health, Consumer
Electronics, Reference, Finance, …
Recorded commercial web
application scanner traffic
Ported common False Positive cases from
Akamai customers (Big Data)
Attacks from Akamai CSI
big data platform
Ported “valid” test cases from other tools*
Exploits from the internet
(fuzzers, exploit-db, …*
Havij & SQLMap attacks
Tens of Thousands of HTTP Requests, divided 95% - 5%
AWT Reports – High Level
Statistics
AWT Reports – Protection
Statistics
AWT Reports – False Positives
Analysis
OWASP CRS – LESSONS LEARNED
CRS Issue #1 – Risk Groups
• CRS 2.2.x uses a single anomaly score
– Visibility (granularity) issues – What really happened?
• Separate anomaly score “accounting” to smaller risk
groups (attack types)
– Clear understanding of which attack took place
XSS = 35, SQLi = 10, RFI = 0, LFI = 0, …
• Challenge:
– requires rule mapping to risk groups
– Some rules contribute to more than 1 risk group
– Requires to put some more thought into anomaly scoring –
it’s not just one pile of rules/scores
CRS Issue #2 –Multiple Thresholds
Threshold
<xss>
25
<xml>
5
XSS Attack:
<script>alert('xss')</script> => Score 30
CMDi Attack:
; /bin/sh
Valid XML:
<book>
cat
/etc/passwd => Score 5
Hello World
</book> => Score 10
Different
risks
require
different
anomaly
thresholds
CRS Issue #2
30
25
20
15
10
TH
5
0
XSS
SQLi
CMDi HTTP
RFI
CRS Issue #3 – HTTP Violations
“BLOCK HTTP PROTOCOL VIOLATIONS
?!???THAT’S LIKE 1.21
LOGS PER DAY!!!!!”
PETABYTES OF
CRS Issue #3 – HTTP Violations
• HTTP RFC Enforcement?! Good Luck!
– APIs, REST services, RSS feeds, Good Bots – most don’t
adhere to HTTP RFC
– Prior to system tuning:
• Missing Accept Header (960015): 14%
• Missing User-Agent Header (960009 ): 3%
• Can’t trust HTTP violation rules on their own
– “Invalid HTTP” risk group with its own threshold
• Blocks only seriously-damaged HTTP requests
– Build more focused tool fingerprints
• See next slide for an explanation on 960015
960015 – Research into 3 hours of
triggers
Which URLs trigger
this rule?
85% Static Media
Files
Perhaps a Unique
User-Agent?
“Android” String
found in 50%
Anything in
Common?
95.1K “Unique” UAs
Common: Android (50%), AppleWebKit (19%), News (21%), App (20%)
Can You Give Me
Something Else?
CRS Issue #4: Cookies
YEAR: 2003
SESSID = 12f0a0193b4d93e9s92a39af;
Quite easy to spot a SQLi or XSS
payload in a cookies
CRS Issue #4: Cookies
YEAR: 2013
C1state = 24~1~-1~1~E~6~6~6~10~10~0~0~|~37A1B34A~2EBA820B~0AEBA380~130959B9~032
7C30B~7617CC73~21B797A5~C6392AF5~5FE036DB~|~8A173E13~7F5D33BF
~30DFEF65~|~~|~0~1~2~3~4~5|3~4~6~7~8||0~1~2|4~4~6||~|~0~0~0~0~0
~0~|~0~0~0~0~0~|~~|~~|~~|~~|; C2state = PC#1382573257902104085.19_06#1384742638|cat#true#1383533098|session#1383533019933
-203317#1383534898; C3data =
{"v":1,"rid":"1371546489873_699561","to":5,"c":"http://www.some.site/pag
e.aspx?a=5","pv":2,"lc":{"d0":{"v":2,"s":true}},"cd":0,"sd":0,"f":13715469047
51} ; Cinfo = 1403D3394_232#scroll on "//<![CDATA[(function() { var f5_cspm
= { pass_params: '1102912_0394939_19210_24253..."
CRS Issue #5: Score Spreading
Across Selectors
In many FP scenarios, score spreads across “selectors”
c1 = 1384044727071|ABCD:2::|AC:1::|PSD:0:AKFJ~MOBILE^CLAK_KOL:1385149290276
[950901 - 5]
c2 = bn:Samsung|mn:GT-I9300 Galaxy S
III|tb:false|mb:true|dos:Android|dosv:4.1|bos:KJSKKL|bosv:9 [981172 - 3]
c3 = PC#1383939352901916004.20_14#1386636727|check#true#1384044787|session#1384044726390399957#1384046587 [981231 - 3]
c4: = ”” [981318, 981242 – 2, 5]
(Total Score: 18)
Consider a FP reduction heuristics that reduces the total score when
spread across selectors? There are security implications,…
CRS Issue #6: Rule Inefficiency
During our big data analysis & AWT usage, we
noticed a few troubling rule issues:
– Many rules have redundancies in expressions
• This tends to push the anomaly score up in many
scenarios (“reinforcing a FP”)
• Forces pushing the threshold much higher than really
needed
– Some rules combine weak & strong signatures
• FP-prone rules generate high score – reducing their
“weight” hurts the accurate signatures in them
– Some rules seemed almost useless – e.g. 981172
Summary
• Big Data:
– OWASP / ModSecurity should consider collecting
anonymized trigger information
– CRS would greatly benefit from a much larger sample set
• CRS Future:
– Akamai has already contributed to the CRS project, and
would continue to contribute back to the community
– We highly recommend adopting some of the major
changes done @ Akamai – mainly the “risk groups” model
& multiple thresholds
• WAF Testing:
– Now that the WAF industry has matured, it is time that
WAF deployments will be measured for accuracy using
tools & methods mentioned here – Precision, Recall and
MCC
THANK YOU