Cybersecurity Project Overview

Transcript Cybersecurity Project Overview

Cybersecurity
Project Overview
9/20/2013
1
Outline
• Introduction
• Literature & Technical Review
– Forums
– IRC
– Honeypots
• Project Status
• Research Projects
• Conclusion
2
Introduction
• As computers become more ubiquitous throughout society, the
security of networks and information systems is a growing concern.
– An increasing amount of critical infrastructure relies on computers and
information technologies
– Advancing technologies have enabled hackers to commit cybercrime much
more easily now than in the past.
• At the same time, accessibility to technologies and methods to
commit cybercrime has grown (Radianti & Gonzalez, 2009)
• As a result, more researchers have become interested in the cyber
domain.
3
Introduction
• Traditional cybersecurity research has focused on technological challenges and
improvements to mitigate cyberattacks (Geer, 2005)
– Overall, there has been a lack of work investigating cybercriminal communities and the
human element behind cybercrime (Hopper et al, 2009; Holt & Kilger, 2012).
– Little is understood about hacker social behaviors, the cybercriminal supply chain, etc.
• Recently, security researchers have begun conducting more explorations of hacker
communities in tandem with focusing on the technological element of security
– Hacker communities contain useful information about cybercriminal black markets,
emerging threats, attack trends, tutorials, malware samples, etc. (Radianti & Gonzalez,
2009 ; Motomaya et al, 2011; Benjamin & Chen, 2012).
– Many unique research questions can be investigated using data collected from hacker
communities, providing new insights for security researchers and practitioners
• Here we review hacker community research relevant to our project goals
4
Literature Review
• Existing literature is useful for providing details on the various facets
of cybersecurity research
• To conduct our own research, we borrow insights and methodologies
commonly identified within reviewed literature
– The human element behind cybercrime, including explorations of hacker
forums, IRC channels, and other hacker social media.
– More traditional security research, such malware analysis, honeypot research,
botnet research, and research utilizing network logs.
– Information on identifying data sources, data collection methodologies,
analytical methods, and existing research gaps.
5
Forums - Identification
Public Sources
•
Researchers look to third parties for information on identifying hacker forums
– Radianti et al, 2007, found a hacker forums cited in news or other media
– Others have utilized the Google Safe Browsing API to acquire data on malicious or
cybercrime related websites (Cova et al, 2010).
Keyword Searches
•
Another method commonly used is to conduct a series of keyword searches.
– For example, Holt & Lampke, 2010, crafted the keyword search “carding dump purchase
sale cvv” to identify hacker black markets where stolen credit card information is sold.
– Keyword searches seem to be common in many similar studies on hacker forums
(Fallman et al, 2010).
Link Identification
• Lastly, it is common practice to scrutinize known forums for links to other hacker
forums and communities.
– Many studies found that hacker forum participants often cite or refer to other hacker
communities (Radianti et al, 2009; Fallman et al, 2010; Holt et al, 2012).
– Thus, a snowball approach using one forum to identify many others could be promising.
6
Forums - Collection
Manual Collection
• Most of the reviewed literature resorted to manual collection or observation
of data
• Some researchers simply observe live forums without attempting any sort of
collection (Holt, 2010; Yip, 2011).
– However, it is acknowledged that taking more active collection approaches,
such as registering forum accounts, is at times useful gain access to
restricted contents (Holt, 2010)
– Such restrictions are only lifted after a forum participant has registered to
a forum or if they have been part of the community for a certain length of
time.
• Other researchers move beyond observation and manually downloaded
threads (Radianti et al, 2009; Holt & Lampke, 2010; Motoyama et al, 2011).
– It is important to store data intended for research offline, as hacker forums
may sometimes spontaneously disappear or reduce visibility (Radianti,
2010).
– Manually collected contents are often also manually coded (Radianti et al,
7
2009; Holt et al, 2012).
Forums - Collection
Automated Collection
• Other researchers utilize more automated data collection methods.
– For example, Benjamin & Chen, 2012 used a web crawler to automatically
collect all publicly available content from America and Chinese hacker forums.
• However, anti-crawling measures are sometimes put in place by
hacker forums, (Spencer, 2008; Fallman et al, 2010)
– Heavy anti-crawling measures make automated collection a very difficult and
slow process
– In some cases, it may be necessary to use proxy servers and other identity
obfuscation techniques to avoid detection of crawling activities (Goel, 2011).
– Anti-crawling measures seem to be a large reason as to why most research to
date has utilized manual collection methods
8
Forums – Anti-Crawling Measures
• Bandwidth monitoring
– Can be circumvented by creating crawlers that employ
more human-like crawling rates
– Can also be circumvented by creating a “distributed
crawler”.
• One computer acts as a master and distributes hyperlinks to
different computers for crawling. Newly discovered
hyperlinks are reported to the master.
• This way, it appears that different users are accessing the
community when in fact you are just using different
computers to share the spidering work on one forum.
9
Forums – Anti-Crawling Measures
• CAPTCHA images (verification codes)
– Requires human input to correctly enter verification codes – when a
correct CAPTCHA is submitted, the client obtains a session cookie used
to create an authenticated session with the server
– In this case, a crawler needs to be able to utilize such session cookies
so that re-input of a verification code is not needed. If the spider
cannot utilize the authenticated session cookie, the server will see the
spider as a new client and request verification through a CAPTCHA
image once again
10
Forums – Anti-Crawling Measures
• Paywalls
– Some communities require a fee to be able to browse and view content.
I do not know if these communities are legitimate or if they are scams; I
suggest we just avoid these communities as there are other more open
sources of data.
• Waiting Periods
– Some forums require newly registered users to wait a certain length of
time before being able to access all forum contents
• Closed registration\Invitation-only
– Some forums close their registration or are invitation-only. We can’t
really do anything about this unless someone else provides us with an
already existing account.
– We are registering accounts on the forums that we are already crawling
in case they someday close registration or become invitation-only.
11
Forums – Identity Obfuscation
• To avoid some anti-crawling measures, we must practice
identity obfuscation
– We may need to reduce bot-like behaviors during collection
– We may also want to mask our true identity
• Reducing crawling rate is useful for circumventing anticrawling measures that monitor bandwidth usage or page
views
• To mask our identity, we can utilize proxy servers or peerto-peer networks to route traffic through
– Lets us even regain access to forums than ban us via IP bans
– Stand-alone web proxies and peer-to-peer networks such as Tor
are both useful for identity obfuscation
12
Forums – Identity Obfuscation
• Traditional proxy server configuration
13
Forums – Identity Obfuscation
14
Forums – Identity Obfuscation
15
Forums – Identity Obfuscation
16
Forums – Identity Obfuscation
Proxy Servers
Tor Network
The Tor network client (~9MB)
Requirements
None
Protocol
Typically HTTP or SOCKS
SOCKS only
Send local network traffic to proxy server for
re-routing to destination server
Tunnel local network traffic to local
Tor client; Tor client automatically
handles peer-to-peer networking and
routing traffic to the destination
server
What does
destination
server see?
Proxy server IP address
IP address of the last Tor relay used
to route your message to the
destination server
Assuming a
new identity?
A new proxy server must be used in
replacement of current the current proxy
Tor client can automatically select
new relay nodes when a new identity
is needed
Finding new
servers?
Lists of public proxy servers exist across
various websites that can be identified
through keyword searches (e.g. “public
proxies”)
The Tor client will automatically find
new relays for the user. Selection
parameters can be used to only use
or exclude relays from specific
17
countries
Usage
Forums – Identity Obfuscation
• There any many stand-alone public proxy servers
– However, they are usually overused and thus slow
– Not stable, many are short-lived
– Need to constantly find new servers
• It may be better to use a peer-to-peer anonymization networks,
such as Tor or I2P
– Established peer-to-peer networks are more stable than stand-alone proxies
– P2P-network protocols often support automated server discovery
– Tor is perhaps the most popular of such networks; requires connection with a
special Tor network client (Ling et al, 2011; Tschorsch & Scheurmann, 2011;
Akhoondi et al, 2012)
– Other less popular networks exist such as Freenet and I2P. (Leavitt, 2009; Fu
et al, 2010)
18
Forums – Identity Obfuscation
Various screenshots of the
graphical Tor controller Vidalia.
Left: A map allows users to view
the locations of all published Tor
relay nodes Middle: A real-time log
of Tor network events allows users
to monitor Tor activity Right: A
basic interface that allows Tor users
to quickly assume a new identity by
routing traffic through a new circuit
19
Forums – Analytical Methods
• After hacker forum contents are collected, they can be analyzed using
content and network analyses
• Content analysis would be useful for understanding the discuss and
information inside hacker social media
– In the set of literature we reviewed, these studies tend to employ manual
collection and analytical methods
– Generally conduct simple counting and statistical work of coded contents for
analysis (Holt & Lampke, 2010; Radianti, 2010; Imperva 2012).
• Network analyses often aim to observe the relationships between forum
participants (Motoyama et al, 2011, Holt et alsss, 2012)
– Both manual observations and automated techniques have been utilized
– Helps to better understand the community social structures and hacker
interaction behaviors
20
IRC - Identification
• Cyber security research conducted on IRC channels often focuses
on questions relating to both hacker communities as well as
botnets.
• Hacker community IRC research is similar to forum studies, as
researchers attempt to locate hacker discussions and cybercriminal
black markets.
• Botnet related research is more focused on identifying botnet
command & control (C&C) channels
– C&C channels are chat rooms often used by cybercriminals control
large groups of malware-infected “zombie” computers
• In either case, finding relevant IRC channels to collect data from is
a challenge
21
IRC - Identification
An example of a hacker IRC channel. A list of users, their messages,
and timestamps for each message can be seen. The participants are
discussing sqlmap, a tool for automated SQL injection and database
hijacking, as well as programming concepts. The top header also
includes links to other IRC channels affiliated with this one.
22
IRC - Identification
• As stated earlier, participants of hacker forums will often
times cite and provide URLs of other hacker resources.
– This includes IRC channels (Radianti et al, 2009; Radianti, 2010).
– Often times, a hacker forum will have an associated IRC
channel, or forum participants will simply mention other
private channels
• Some researchers collect content from IRC channels at
random and perform content analysis to determine
whether a channel is hacker-related (Fallman et al, 2010).
– Use of automated bots to log IRC chat data
– Machine learning classifier can be used to check contents
23
IRC - Identification
• A different research focus for some security researchers is to
identify botnet command and control (C&C) channels.
– These channels are used by cybercriminal “botmasters” to give
commands to collections of malware-infected computers that
covertly join the IRC channel and wait for instruction.
24
IRC - Identification
• C&C identification techniques have generally utilized
honeypots
– Honeypots are systems that are configured to simulate
computer systems with software vulnerabilities
– Can allow wild malware to intentionally exploit honeypot
vulnerabilities; malware behaviors can be captured and studied
in a sandboxed environment (Rajab et al, 2006; Lu et al, 2009).
– All code execution, system changes, and network traffic are
tracked and logged within a honeypot (Mielke & Chen, 2008;
Zhu et al, 2008).
– By observing outbound network traffic generated by malware,
researchers may potentially reveal botnet C&C channels and
other hacker-related web addresses.
25
IRC - Collection
• There are two common techniques used to collect IRC chat data, but
both involve logging of real-time chat.
– Logging IRC chat in real-time manually or using automated bots. (Fallman et al,
2010)
– Scraping IRC packet contents generated by a honeypot’s local network traffic
(Lu et al, 2009)
• Several strategies can be taken to effectively use bots and
ensure comprehensive data collection (Fallmann et al, 2010):
– Swap strategy – Some IRC channels will automatically disconnect users who appear
idle. Thus, it can be useful to occasionally rotate bots into different IRC channels for
logging, avoiding some problems with idling
– Use of multiple bots in the same channel can be used to help ensure
comprehensive collection in case some bots get disconnected
• Packet scraping requires the use of network traffic analyzer software
– Wireshark is a popular resource that can be freely used to capture network packets
26
IRC – Analytical Methods
• Different forms of analysis should be used depending on research
goals and data. For example, the goals and methods used for analysis
would be different in:
– Botnet research with data from command & control channels
– Research on IRC channels affiliated with hacker forums or acting as social hubs
• The simplest method of analysis, much like hacker forums, is to
manually sift through data (Franklin et al, 2007; Fallmann et al. 2010;
Motoyama et al. 2011)
• Automated content and network analyses could be extended to IRC
datasets as well when studying hacker IRC channels
– Can reveal emerging threats, popular tools and methods
– May help with attack attribution
27
IRC – Analytical Methods
• For botnet C&C channels, there common themes for analysis
• Characterizing botmaster activity
– Paxton et al, 2011 investigate the different operational styles used by
botmasters by computing some usage statistics per botnet master
– Mielke & Chen, 2008 use clustering to identify potential collaboration between
botmasters based on their participation across different known C&C channels
• Identifying botnets based on network traffic
– Much research is spent analyzing honeypot captures and network logs to
develop new techniques to combat evolving botnets (Lu et al, 2009; Choi &
Lee, 2012)
– Botnets are becoming increasingly more sophisticated in evading detection
28
Honeypots
• Honeypots are computers or clients that are setup with the
purpose of attracting and logging cyber-attacks in real time
– Often emulate or are exposed to live security vulnerabilities in order to
capture and monitor both malware and cyber-attackers
– Can be used to monitor various protocols, applications, or operating system
attacks
• As mentioned in botnet literature, honeypot log files can be useful
for identifying new botnets and observing malware
– Infected honeypots may be transmitting or receiving data from botnet C&C
channels. Analyzing network data could reveal new botnets
– Malware execution behaviors can be logged, studied for malware research
29
Honeypots
• Two types of honeypots exist (Zhuge et al, 2008; Cova et al, 2010):
– Low-interaction honey pots:
• Only assess malicious threats at a shallow level. They record occurrences of
attacks and some associated metadata, but the level of data capture a lowinteraction honeypot is typically limited to just recording that a malware or
cybercriminal incident occurred
• Easy to set up. For example, a low-interaction honeypot could be a web crawler
that randomly surfs the Internet to find websites that attempt to employ driveby attacks (web pages attempting to remotely execute code through browser or
other application vulnerabilities). However, as it is low-interaction, it does not
allow the malware to execute; data capture limited to logging initial event.
– High-interaction honey pots
• Typically provide a much more comprehensive behavioral analysis of malware
or hacker behavior. They may record all system changes, registry hooks, library
calls, etc. made by malware or that occur during a cyber-attack.
• High-interaction honeypots are more difficult to set-up as they require a
significantly more overhead (you need to basically dedicate an entire operating
30
system instance towards acting as a high-interaction honeypot).
Honeypots – Honeypot Client List
• There are many open-source tools for deploying honeypots with
logging mechanisms. Below are some of the more popular ones cited
in papers and that frequently appear in search engines results:
Name
Type
Software\Protocol Target
Developer
Capture-HPC
High-interaction
Web Browser, Office
applications, Media players
Honeynet Project
HoneyClient
High-interaction
Web browser
MITRE
HoneyMonkey
High-interaction
Web browser
Microsoft Research
SHELIA
High-interaction
E-mail, Office applications
VU Univ. Amsterdam
UW Spycrawler
High-interaction
Web browser
Univ. of Washington
HoneyD
Low-Interaction
Network Services
Honeynet Project
HoneyC
Low-Interaction
Web browser
Honeynet Project
Kippo
Low-Interaction
SSH
Open Source Project
PhoneyC
Low-Interaction
Web browser
Open Source Project
Thug
Low-Interaction
ActiveX, Javascript
Honeynet Project
31
Honeypots
32
Honeypots – Analytical Methods
• As mentioned in the IRC literature review, monitoring honeypot
network traffic logs (or network logs in general) can reveal the
addresses of botnet C&C channels
• Honeypots can also provide log data which would be useful for
analysis
– Identifying the class of unknown malware by analyzing malware
execution behavior logs through machine learning classification and
clustering (Rieck et al, 2011).
– Automated identification of advanced persistent threats (APTs), an
increasingly important area of security research (Binde et al, 2011;
Hutchins et al, 2011).
33
Project Status
• After reviewing literature, we planned to collect
data through:
– Hacker forums
– Hacker IRC channels
– Honeypots
• We are interested in data relating too:
– Hacker social media
– Botnet C&C channels
– Honeypot log data
34
Project Status - Forums
• We identified 20 hacker forums from 4 geopolitical
regions that seemed of research interest
– 5 forums from China, the Middle-East, Russia, and the
U.S. respectively
– Languages: Mandarin, Arabic, Farsi, Russian, English
– Forums were manually explored to review activity levels,
depth of discussions, black market activity, social
mechanisms such as ‘friending’ or ‘liking’, and other
interesting aspects
• Relevant forums were chosen for collection using an
automated crawler.
– However, unlike traditional crawling, we must account for
anti-crawling measures
35
Project Status - Forums
Forum
Region
Status
Anti-Crawling Measures?
bbs.51cto.com
China
Parsed
N
cnhonkerarmy.com
China
Parsed
N
Linuxprobe.com
China
Parsed
N
bbs.hackdark.com
China
Parsed
N
heishou.org
China
Parsed
N
v4-team.com
Middle-East\Iran (Arabic)
Parsed
Y
ashiyane.org/forums
Middle-East\Iran (Persian)
Parsed
N
forums.mihandownload.com
Middle-East\Iran (Persian)
Parsed
N
shabgard.org/forums-
Middle-East\Iran (Persian)
Parsed
N
arhack.net/vb
Middle-East\Iran (Arabic)
Parsed
Y
antichat.ru
Russia
Parsed
Y
exploit.in/forum
Russia
Spidering
Y
Zloy.bz
Russia
Parsed
N
forum.xeka.ru
Russia
Parsed
N
forum.xakepok.net
Russia
Parsed
N
Ic0de.org
United States
Parsed
N
Anon-hackers.com
United States
Parsed
Y
Vctools.net
United States
Parsed
N
Elitehackforums.com
United States
Parsed
N
Hackhound.org
United States
Parsed
N
36
Project Status - Forums
Forums with Anti-crawling Measures
Region
Anti-Crawling Description
Anti-Crawling Measure: Bans IP addresses
associated with non-human browsing activity
v4-team.com
Middle-East\Iran (Arabic)
Work-around: Slowed crawlers down to only
crawl only a couple pages every few seconds
second. Community has been collected.
Anti-Crawling Measure: Bans IP addresses
associated with non-human browsing activity
antichat.ru
Russia
Work-around: Slowed crawlers down to only
crawl only a couple pages every few seconds
second. Community has been collected.
Anti-Crawling Measure: Forum blocks US-based
traffic
exploit.in/forum
Russian
Work-around: Use of TOR to route traffic
through other countries before reaching
destination server
Anti-Crawling Measure: Some contents are
restricted from new or young accounts
Anon-hackers.com
United States
Work-around: Wait for account to age on the
forum
37
Project Status - Forums
Forum Name
# of
Members
# of Threads
# of Posts
Start Date
End Date
51cto
298,856
239,828
2,641,548
10-10-2005
07-09-2013
Cnhonkerarmy
94,199
65,213
1,117,135
01-01-2000
07-09-2013
Hackdark
14,515
11,499
233,108
01-10-2000
07-19-2013
Heishou
10,589
29,215
190,465
09-13-2009
06-16-2013
Linuxprobe
17,174
22,358
702,986
10-11-2009
07-09-2013
Xeksec.ru
14,732
39,240
50,594
06-13-2007
09-08-2013
Russian
forum.xakepok.net
3,839
13,665
46,076
04-15-2009
08-31-2013
Russian
zloy.bz
14,302
69,385
485,231
10-01-2004
07-06-2013
Russian
antichat.ru
25,500
19,515
199,929
01-01-2003
08-19-2013
Russian
Language
Chinese
Chinese
Chinese
Chinese
Chinese
Project Status - Forums
Forum Name
# of
# of
Members Threads
# of Posts
Start Date
End Date
Language
Arhack.net
19,649
46,285
429,507
07-01-2008
09-11-2013
Perisan
Shabgard.org
2,922
11,895
77,903
08-26-2003
09-02-2013
Perisan
Mihandownload
5,292
181,460
169,279
06-08-2008
09-14-2013
Persian
ashiyane.org
16,404
10,605
172,478
06-15-2003
08-11-2013
Persian
v4-team.com
12,934
33,999
102,757
02-26-2008
07-24-2013
Arabic
ic0de.org
771
3,166
12,347
01-01-2009
07-08-2013
English
hackhound.org
678
990
7,067
10-12-2012
09-02-2013
English
anon-hackers.com
1,109
883
3,949
09-11-2012
07-14-2013
English
vctools.net
6,563
10,383
42,935
07-27-2009
08-10-2013
English
elitehackforums
1,676
3,923
14,267
11-18-2011
08-22-2013
English
Deeper summarization of each forum is forth-coming (Popular topics, top users, etc)
39
Project Status - Forums
Black market discussions on a Russian hacker
forum (antichat.ru). Most discussions regard
the sale of stolen and pirated software.
In one popular thread, various stolen security
products are found for sale
40
•
Project Status - Forums
An Egyptian hacker shares a
personal project on an
American hacking forum
(elitehackforums.com). This
is an interesting case of
hackers sharing tools with
other hackers outside their
own geopolitical.
Additionally, the poster
appears to be affiliated with
a group called the “Egyptian
Shell Team.” It would be
interesting to analyze
occurrences of this hacker,
software, and team name in
the hacker social media of
other geopolitical regions.
Additionally, searches for the
“Egyptian Shell Team” may
reveal a group-run forum or
IRC channel, perhaps for
coordinating.
41
Project Status - IRC
• Hacker IRC channels can be detected by searching through hacker forums
– Some hacker forums host an official IRC channel
– Forum members often share IRC channels they are affiliated with
– IRC channels can be searched for automatically; the typical IRC server address
pattern is irc.server.com:port_number while the typical channel name is
#channelname. It should be noted that hashtags were used on IRC before they
appeared on Twitter and other social media.
• There two ways to collect data as described by literature:
– Leaving chat-logging bots inside of IRC channels to collect data in real-time
– Collecting IRC–related network packets using a honeypot
• Data collection rates depend on popularity of IRC channels, which can vary
widely between channels
• Potential research direction:
– There are little to no studies comparing how hackers utilize different
communication media
– Check for correlation between IRC discussion and forum discussions, describe
any differences.
42
Project Status - IRC
A hacker on the American
community VCTools.net posts
information for other forum
participants to join a hackingrelated IRC channel
43
Project Status - IRC
The front-end for a web-based IRC client to
connect users to Antichat.ru’s IRC server
44
Project Status - IRC
• Aside from hacker social rooms, IRC channels are also often
hosts to botnet C&C channels
– Aforementioned literature describes using honeypots to intentionally
execute malware and monitor network traffic for packets to\from a
potential botnet C&C channel
– This requires the selection and implementation of honeypot clients
• We currently have 2 honeypots running the Kippo
Honeypot software (hosted on DigitalOcean.com)
– Low-interaction honeypot
– Captures SSH protocol activity
– Selected due to simplicity to setup, and popularity of the
SSH port
– Collected ~12,000 events thus far in two months (mostly
brute force attacks)
45
•
Project Status - Honeypots
Brute force example:
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
2013-07-26 13:39:08+0000 [HoneyPotTransport,491,211.157.227.4] starting service ssh-userauth
2013-07-26 13:39:09+0000 [SSHService ssh-userauth on HoneyPotTransport,491,211.157.227.4] paul trying auth password
2013-07-26 13:39:09+0000 [SSHService ssh-userauth on HoneyPotTransport,491,211.157.227.4] login attempt [paul/cacutza]
failed
2013-07-26 13:39:10+0000 [-] paul failed auth password
2013-07-26 13:39:10+0000 [-] unauthorized login:
2013-07-26 13:39:10+0000 [SSHService ssh-userauth on HoneyPotTransport,491,211.157.227.4] paul trying auth password
2013-07-26 13:39:10+0000 [SSHService ssh-userauth on HoneyPotTransport,491,211.157.227.4] login attempt [paul/paul] failed
2013-07-26 13:39:11+0000 [-] paul failed auth password
2013-07-26 13:39:11+0000 [-] unauthorized login:
2013-07-26 13:39:12+0000 [HoneyPotTransport,491,211.157.227.4] connection lost
2013-07-26 15:50:28+0000 [kippo.core.honeypot.HoneyPotSSHFactory] New connection: 211.157.227.4:39126
(192.241.216.71:22) [session: 492]
2013-07-26 15:50:28+0000 [HoneyPotTransport,492,211.157.227.4] Remote SSH version: SSH-2.0-libssh2_1.4.1
2013-07-26 15:50:28+0000 [HoneyPotTransport,492,211.157.227.4] kex alg, key alg: diffie-hellman-group1-sha1 ssh-rsa
2013-07-26 15:50:28+0000 [HoneyPotTransport,492,211.157.227.4] outgoing: aes128-ctr hmac-sha1 none
2013-07-26 15:50:28+0000 [HoneyPotTransport,492,211.157.227.4] incoming: aes128-ctr hmac-sha1 none
2013-07-26 15:50:29+0000 [HoneyPotTransport,492,211.157.227.4] NEW KEYS
2013-07-26 15:50:29+0000 [HoneyPotTransport,492,211.157.227.4] starting service ssh-userauth
2013-07-26 15:50:30+0000 [SSHService ssh-userauth on HoneyPotTransport,492,211.157.227.4] office trying auth password
2013-07-26 15:50:30+0000 [SSHService ssh-userauth on HoneyPotTransport,492,211.157.227.4] login attempt [office/cacutza]
failed
2013-07-26 15:50:31+0000 [-] office failed auth password
2013-07-26 15:50:31+0000 [-] unauthorized login:
2013-07-26 15:50:31+0000 [SSHService ssh-userauth on HoneyPotTransport,492,211.157.227.4] office trying auth password
2013-07-26 15:50:31+0000 [SSHService ssh-userauth on HoneyPotTransport,492,211.157.227.4] login attempt [office/test123]
failed
2013-07-26 15:50:32+0000 [-] office failed auth password
2013-07-26 15:50:32+0000 [-] unauthorized login:
46
Project Status - Honeypots
• Next honeypot to be implemented is called HoneyC
– A tool by the Honeynet Project, whose work seems often
used by security researchers
– Low-interaction honeypot, more advanced and
customizable than Kippo
• After HoneyC, another potential honeypot software to
look at is Capture-HPC
– Another Honeynet Project, but it is a high-interaction
honeypot client
– Requires a bit more time, expertise to set-up properly
– Can provide us with data useful for malware analysis and
IRC channel identification
47
Research Plan
Research Project – Reputation Study
•
As previously mentioned, many cybercriminal assets can be found freely available
within hacker forums. (Radianti & Gonzalez, 2009; Motomaya et al, 2011).
– Contents are often for others to learn new techniques or to help improve a shared technique
– Even legitimate tools, such as search engines, are discussed for uses in cybercriminal attacks
•
Due to open sharing of cybercriminal assets, successfully committing a cybercriminal
act is much more accessible and easier than in the past (Zhuge et al, 2008; Moore &
Clayton, 2009)
– Predictably has led to increased incidence of cybercrime, which is a growing problem for
society
– However, increased accessibility of hacking software and techniques has also led to increased
competition among cybercriminals.
– For example, computer worms written by rivaling cybercriminals have been observed to seek
out and uninstall one another from victim computers (Crandall et al, 2011)
•
It seems counter-intuitive that cybercriminals would provide free assistance and
resources to eventual competitors
Research Project – Reputation Study
Hackhound.org
Hacking tool
interface
Description of code
functionality
Hacker’s
Reputation
Score
Attached Hacking
Tool
Embedded sample
of code
Unpack.cn
Left: A cybercriminal on hackhound.org publishes the latest version of his hacking tool meant to help others steal cached passwords on
victims’ computers Right: A hacker of the Chinese community Unpack.cn posts sample code demonstrating how to reverse engineer
software written in the Microsoft .NET framework
Research Project – Reputation Study
•
Perspectives from social psychology may be borrowed to attempt to explain this
phenomenon
– Social exchange theory states that all human relationships are the result of cost-benefit analysis
and consideration of opportunity costs (Emerson, 1976)
– This theory suggests that hackers would perform a cost-benefit analysis of aiding other hackers,
and subsequently choose to do so
– What may be the underlying motive for hackers to help one another?
•
Past hacker literature reveals that peer approval and reputation are central to hacker
social circles
– Reputation of sellers and buyers is a large factor for facilitating successful cybercriminal black
market transactions. (Radianti 2010; Yip et al, 2013)
– Some hackers organize into groups to launch sophisticated, financially or politically motivated
cyber attacks; such groups often seek to recruit members known to be technically skilled,
educated, and able to utilize sophisticated cyber attack methods (Choo & Smith, 2008)
51
Research Project – Reputation Study
•
Hackers may be sharing knowledge for gains in reputation
– Previous social science research has found that individuals that contribute to the cognitive
advancement of their community will experience increases in their reputation, often leading
to leadership positions (Muller, 2006)
– In the forum context, one would have to rely on crafting quality, insightful postings to
contribute to the cognitive advancement of their community, as they are the primary way to
communicate with other forum participants.
– Thus, forum messages should be evaluated for their relevance, quality, and contribution
•
Since forum postings may play a large role in the composition of forum postings, a
systematic method to analyze form posts must be created
– Media synchronicity theory (MST) helps explain aspects of effective communication
performance of various forms of media (Fuller et al, 2008)
– It is especially useful for describing features of quality communication performance found in
new forms of media, including digital contents
– Currently in the process of reviewing other papers to find features used in previous research
that I can borrow (HTML features, network features, content features, etc)
Research Project – Reputation Study
•
•
Preliminary study presented at IEEE Intelligence and Security Informatics, 2012
We collected two hacker communities from the United States and China to
examine the mechanisms in which key actors arise
– Identified several features from literature that may contribute to hacker reputation
– Ran a regression to identify what features contributed towards
hacker reputation the most
• Found that hackers who participated frequently and also contributed
the most towards the cognitive advance of their community had the
highest reputation
– Those who posted the most tools, source code,
and other helpful contents saw the biggest gains in
reputation
– Others who participated regularly and were active
in their community also appear to become reputable
Research Project – Malware Analysis
Attack Vector
Categories
Recently Posted
Exploits
55
Research Project – Malware Analysis
Program loads library for network
communications
Attack payload
Low-level instructions to access vulnerable
application’s memory space
An example of a Perl exploit that attempts a remote buffer overflow attack on a popular enterprise Windows and
Unix mailserver software. Malicious code such as this can be difficult for researchers to interpret in their
explorations. Automated static analysis tools can help in such scenarios.
56
Research Project – Malware Analysis
Machine learning for attack vector identification in malware source code
• Preliminary study presented at IEEE Intelligence and Security Informatics, 2012
• Source code files for various malicious programs and scripts are among resources shared
–
•
However, research on malicious source code is largely limited largely due to both technical and language
limitations
– Documentation may not be provided with source code, or may be written in a foreign language
– Automated tools that help security professionals and researchers overcome such limitations would be of
great asset
We collected nearly 4,000 malicious source codes written in three different programming languages, utilizing
four distinct attack vectors (local memory attacks, remote code execution attacks, web application exploits, and
denial of service scripts)
– Research cites feature selection for malware analysis is difficult, so we utilize a genetic algorithm to select
the optimal feature set to classify exploits by their attack vector
– Used SVM and C4.5 decision tree algorithms for classification, accuracy varied between 80%-95%
– Could improve accuracy by including program control flow feature, opcode, etc.
Research Project – Botnet C&C Channels
Identifying social networks in botnet C&C channels
• A botnet monitoring group, the ShadowServer Foundation, provided the AI Lab with logs from
multiple botnet C&C channels
•
Text mining techniques were used to differentiate bot masters from connected zombie
computers
– Bot master names were tracked across all channels
– Several names appeared frequently across the data set
•
By clustering bot masters according to their channel
participation, potential collaboration between bot masters
can be identified
•
The roles of individuals within each group,
and the overall operational style of each group
can be identified by further analyzing C&C logs
•
Additionally, logs could be used to identify C&C activity patterns; this could help automatically
identify future C&C channels
Conclusion
• We’re continuing to work on data collection efforts
– Finishing forum collection
– Starting to identify & collecting IRC channels
– Implement more complex honeypots
• Collection team will provide summaries of deeper
content found
• Progress will continue on research projects
• Expand and publish hacker reputation study
• Revisit other projects, continue forum research
59
References
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Abu Rajab, M., Zarfoss, J., Monrose, F., & Terzis, A. (2006). A multifaceted approach to understanding the botnet
phenomenon. Proceedings of the 6th ACM SIGCOMM on Internet measurement - IMC ’06, 41.
Akhoondi, M., Yu, C., & Madhyastha, H. V. (2012). LASTor: A Low-Latency AS-Aware Tor Client. 2012 IEEE Symposium on
Security and Privacy, 476–490.
Benjamin, V., & Chen, H. (2012). Securing Cyberspace : Identifying Key Actors in Hacker Communities. IEEE Intelligence and
Security Informatics.
Binde, B. E., Mcree, R., & Connor, T. J. O. (2011). Assessing Outbound Traffic to Uncover Advanced Persistent Threat. SANS
Technology Institute.
Cova, M., Kruegel, C., & Vigna, G. (2010). Detection and analysis of drive-by-download attacks and malicious JavaScript code.
Proceedings of the 19th international conference on World wide web - WWW ’10, 281.
Crandall, J. R., Forrest, S., & Ladau, J. (2011). The Ecology of Malware. Proceedings of the 1st ACM workshop on Security and
privacy in smartphones and mobile devices, 99–106.
Dholakia, Uptal M.; Bagozzi, Richard P.; Pearo, Lisa Klein. A Social Influence Model of Consumer
Participation in Networkand Small-group-based Virtual Communties. International Journal of
Research in Marketing. 2004.
Dolfsma, Wilfred; Soete, Loe. Understanding the Dynamics of a Knowledge
Economy. Edward
Elgar
Publishing. 2006.
Emerson, R. M. (1976). Social Exchange Theory. nnual Review of Sociology, 2, 335–362.
Fallmann, H., Wondracek, G., & Platzer, C. (2010). Covertly Probing Underground Economy Marketplaces. Proceedings of the
7th international conference on Detection of intrusions and malware, and vulnerability assessment (DIMVA), 101–
110.
Franklin, J., Paxson, V., Perrig, A., & Savage, S. (2007). An Inquiry into the Nature and Causes of the Wealth of Internet
Miscreants. Proceedings of the 14th ACM conference on Computer and communications security, 375–388.
Fu, X., Ling, Z., Yu, W., & Luo, J. (2010). Cyber Crime Scene Investigations (C2SI) through Cloud Computing. 2010 IEEE 30th
International Conference on Distributed Computing Systems Workshops, 26–31. doi:10.1109/ICDCSW.2010.33
Fuller, R. M., & Valacich, J. S. (2008). T HEORY AND R EVIEW M EDIA , T ASKS , AND C OMMUNICATION P ROCESSES : MIS
Quarterly, 32(3), 575–600.
Geer, D. (2005). Malicious Bots Threaten Network Security. IEEE Computer Society, 38(1), 18–20.
Goel, S. (2011). Cyberwarfare Connecting the Dots in Cyber Intelligence. Communications of the ACM, 54(8), 132.
60
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
References
Hall, Angela T; Blass, Fred R; Ferris, Geral R; Massengale, Randy. Leader
Reputation and Accountability in
Organizations: Implications for Dysfunctional Leader Behavior. The Leadership Quarterly. Volume 15. Issue
4. August, 2004.
Holt, T. J. (2010). Exploring Strategies for Qualitative Criminological and Criminal Justice Inquiry Using OnLine Data. Journal of
Criminal Justice Education, 21(4), 466–487.
Holt, T. J., & Kilger, M. (2012). Know Your Enemy : The Social Dynamics of Hacking. The Honeynet Project, 1–17.
Holt, T. J., & Lampke, E. (2010). Exploring stolen data markets online: products and market forces. Criminal Justice Studies: A
Critical Journal of Crime, Law, and Society, 23(1), 33–50.
Holt, T. J., Strumsky, D., Smirnova, O., & Kilger, M. (2012). Examining the Social Networks of Malware Writers and Hackers.
International Journal of Cyber Criminology, 6(1), 891–903.
Hopper, L., Hopper, R., & Womble, P. (2009). Identifying network attacks from a social perspective. 2009 IEEE Conference on
Technologies for Homeland Security, 511–515.
Hutchins, Eric M, Michael Cloppert, R. A. (2011). Intelligence-Driven Computer Network Defense Informed by Analysis of
Adversary Campaigns and Intrusion Kill Chains. Lockheed Martin Corporation, (July 2005).
II, C. J. M., & Chen, H. (2008). Botnets, and the CyberCriminal Underground. IEEE International Conference on Intelligence and
Security Informatics 2008, 206–211.
Imperva. (2012). Imperva Hacker Intelligence Intitiative. Monthly Trend Report #13. doi:10.1002/ana.23759
Lampe, Klaus Von; Johansen, Per Ole. Organized Crime and Trust: On the Conceptualization and Empirical
Relevance
of Trust in the Context of Criminal Networks. Global Crime. Volume 6. Issue 2. 2004.
Jang, D., Kim, M., Jung, H., & Noh, B. (2009). Analysis of HTTP2P Botnet : Case Study Waledac. IEEE 9th Malaysia
International Conference on Communications, 15–17.
Kshetri, N. (2006). The Simple Economics of Cybercrimes. IEEE Security & Privacy, Jan-Feb, 33–39.
Leavitt, N. (2009). Anonymization Technology Takes a High Profile. IEEE Computer Society, (November), 15–18.
Ling, Z., Luo, J., Yu, W., & Fu, X. (2011). Equal-Sized Cells Mean Equal-Sized Packets in Tor? 2011 IEEE International Conference
on Communications (ICC), 1–6. Lu, W., & Ghorbani, A. a. (2008). Botnets Detection Based on IRC-Community. IEEE
GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference, (1), 1–5.
Lu, W., Tavallaee, M., & Ghorbani, A. a. (2009). Automatic discovery of botnet communities on large-scale communication
networks. Proceedings of the 4th International Symposium on Information, Computer, and Communications Security
- ASIACCS ’09, 1.
McCusker, R. (2006) Transnational organised cyber crime: distinguishing threat from reality. Crime, Law and Social Change.
61
46 (4-5), 257-273.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
References
Motoyama, M., McCoy, D., Levchenko, K., Savage, S., & Voelker, G. M. (2011). An analysis of underground forums.
Proceedings
of the 2011 ACM SIGCOMM conference on Internet measurement conference - IMC ’11, 71.
Moore, T., & Clayton, R. (2009). Evil Searching : Compromise and Recompromise of Internet Hosts for Phishing. Financial
Cryptography and Data Security, 256–272.
Muller, Paul. Reputation, Trust and the Dynamics of Leadership in Communities of Practice. Journal of Management and
Governance.
Volume 10. Number 4. November, 2006.
Radianti, J. (2010). A Study of a Social Behavior inside the Online Black Markets. 2010 Fourth International Conference on Emerging
Security Information, Systems and Technologies, 88–92.
Radianti, J., Rich, E., & Gonzalez, J. J. (2007). Using a Mixed Data Collection Strategy to Uncover Vulnerability Black Markets.
Workshop for Information Security and Privacy.
Radianti, J., Rich, E., & Gonzalez, J. J. (2009). Vulnerability Black Markets : Empirical Evidence and Scenario Simulation. 42nd
Hawaii International Conference on, 1–10.
Rieck, K., Trinius, P., Willems, C., & Holz, T. (2011). Automatic Analysis of Malware Behavior using Machine Learning. Journal
of Computer Security, 1–30.
Spencer, J. F. (2008). Using XML to map relationships in hacker forums. Proceedings of the 46th Annual Southeast Regional Conference
on XX - ACM-SE 46, 487.
Tschorsch, F., & Scheuermann, B. (2011). Tor is unfair — And what to do about it. 2011 IEEE 36th Conference on Local
Computer
Networks, 432–440.
Turrini, Elliot. (2010) Cybercrimes: A Multidisciplinary Analysis. Springer Publishing.
Yadav, S., Reddy, A. K. K., & Reddy, A. L. N. (2010). Detecting Algorithmically Generated Malicious Domain Names Categories
and Subject Descriptors. Proceedings of the 10th ACM SIGCOMM conference on Internet measurement.
Yip, M. (2011). An Investigation into Chinese Cybercrime and the Applicability of Social Network Analysis. ACM Web Science
Conference.
Yip, M., Shadbolt, N., & Webber, C. (2013). Why Forums ? An Empirical Analysis into the Facilitating Factors of Carding Forums. ACM
Web Science, May.
Zhang, L., Yu, S., Wu, D., & Watters, P. (2011). A Survey on Latest Botnet Attack and Defense. 2011IEEE 10th International Conference
on Trust, Security and Privacy in Computing and Communications, 53–60.
Zhu, Z., Lu, G., Chen, Y., Fu, Z. J., Roberts, P., & Han, K. (2008). Botnet Research Survey. 2008 32nd Annual IEEE International
Computer Software and Applications Conference, 967–972.
Zhuge, J., Holz, T., Song, C., Guo, J., & Han, X. (2008). Studying Malicious Websites and the Underground Economy on the Chinese
62
Web. Workshop on the Economics of Information Security, 225–244.