机器学习研究及最新进展

Transcript 机器学习研究及最新进展

Artificial Immune System
and Its Applications
Prof. Ying TAN
National Laboratory on Machine Perception
Department of Intelligence Science
Peking University, Beijing 100871, P.R.China
2015/7/17
Y. Tan---Artificial Immune Sys.
1
Contents
•
•
•
•
•
Biological Immune System
Artificial Immune System
Basic Algorithms of AIS
AIS design procedure
Case Studies
– Malicious Executable Detection
– Film Recommender
New
• Immuneocomputing – IC
• Danger Theory
• Future
2015/7/17
Y. Tan---Artificial Immune Sys.
2
The Immune System is…
Immune system: a system that
protects the body from foreign
substances and pathogenic
organisms by producing the
immune response
Immunity: state or quality of
being resistant (immune), either
by virtue of previous exposure
(adaptive immunity) or as an
inherited trait (innate immunity)
2015/7/17
Y. Tan---Artificial Immune Sys.
3
Why is the Immune System?
Immune system has following appealing features:
• Recognition
– Anomaly detection
– Noise tolerance
•
•
•
•
•
•
•
•
•
Robustness
Feature extraction
Diversity
Reinforcement learning
Memory;
Dynamically changing coverage
Distributed
Multi-layered
Adaptive
2015/7/17
Y. Tan---Artificial Immune Sys.
4
Role of Biological Immune System
• Protect our bodies from pathogen and
viruses
• Primary immune response
– Launch a response to invading pathogens
• Secondary immune response
– Remember past encounters
– Faster response the second time around
2015/7/17
Y. Tan---Artificial Immune Sys.
5
Immune cells
• There are two primarily types of
lymphocytes:
– B-lymphocytes (B cells)
– T-lymphocytes (T cells)
• Others types include macrophages,
phagocytic cells, cytokines, etc.
2015/7/17
Y. Tan---Artificial Immune Sys.
6
Where is it?
Primary lymphoid organs
Secondary lymphoid organs
Tonsils and adenoids
Thymus
Spleen
Peyer’s patches
Appendix
Lymph nodes
Bone marrow
Lymphatic vessels
2015/7/17
Y. Tan---Artificial Immune Sys.
7
Multiple layers of the immune system
Pathogens
Skin
Biochemical
barriers
Phagocyte
Innate
immune
response
Lymphocytes
Adaptive
immune
response
2015/7/17
Y. Tan---Artificial Immune Sys.
8
Antigen
• Substances capable of starting a
specific immune response commonly
are referred to as antigens
• This includes some pathogens such as
viruses, bacteria, fungi etc .
2015/7/17
Y. Tan---Artificial Immune Sys.
9
Biological Immune System
vs
Innate
Cell Mediated
Acquired
vs
Humoral
T Cell (Helper)
B Cell
Secretes
T Cell (Killer)
2015/7/17
Antibody
Y. Tan---Artificial Immune Sys.
10
How does IS work: A simplistic view
2015/7/17
Y. Tan---Artificial Immune Sys.
11
Self/Non-Self Recognition
• Immune system needs to be able to
differentiate between self and non-self
cells
• Antigenic encounters may result in cell
death, therefore
– Some kind of positive selection
– Some element of negative selection
2015/7/17
Y. Tan---Artificial Immune Sys.
12
Immune Pattern Recognition
BCR or Antibody
B-cell Receptors (Ab)
Epitopes
Antigen
B-cell
• The immune recognition is based on the complementarity
between the binding region of the receptor and a portion of the
antigen called epitope.
• Antibodies present a single type of receptor, antigens might
present several epitopes.
– This means that each antibody can recognize a single
antigen
2015/7/17
Y. Tan---Artificial Immune Sys.
13
Clonal Selection
Clonal deletion
(negative selection)
Self-antigen
Proliferation
(Cloning)
M
M
Antibody
Memory cells
Selection
Differentiation
Plasma cells
Foreign antigens
Self-antigen
Clonal deletion
(negative selection)
2015/7/17
Y. Tan---Artificial Immune Sys.
14
Main Properties of Clonal
Selection (Burnet, 1978)
• Elimination of self antigens
• Proliferation and differentiation on contact of
mature lymphocytes with antigen
• Restriction of one pattern to one differentiated
cell and retention of that pattern by clonal
descendants;
• Generation of new random genetic changes,
subsequently expressed as diverse antibody
patterns by a form of accelerated somatic
mutation
2015/7/17
Y. Tan---Artificial Immune Sys.
15
Immune Network Theory
• Idiotypic network (Jerne, 1974)
• B cells co-stimulate each other
– Treat each other a bit like antigens
• Creates an immunological memory
Suppression
Negative response
Paratope
Ag
1
2
Idiotope
3
Antibody
Activation
Positive response
2015/7/17
Y. Tan---Artificial Immune Sys.
16
Reinforcement Learning and
Immune Memory
• Repeated exposure to an antigen
throughout a lifetime
• Primary, secondary immune responses
• Remembers encounters
– No need to start from scratch
– Memory cells
• Continuous learning
2015/7/17
Y. Tan---Artificial Immune Sys.
17
Learning (2)
Antibody Concentration
Lag
Lag
Response
to Ag1
Lag
Response
to Ag1
...
...
Antigen Ag1
2015/7/17
Cross-Reactive
Response
Secondary Response
Primary Response
...
Response to
Ag1’=Ag1 + Ag3
Response
to Ag2
Antigens
Ag1, Ag2
Y. Tan---Artificial Immune Sys.
...
Antigen
Ag1 + Ag3
Time
18
Back
Immune System: Summary
• Define host (body cells) from external entities.
• When an entity is recognized as foreign (or
dangerous)- activate several defense
mechanisms leading to its destruction (or
neutralization).
• Subsequent exposure to similar entity results in
rapid immune response.
• Overall behavior of the immune system is an
emergent property of many local interactions.
2015/7/17
Y. Tan---Artificial Immune Sys.
19
Back
Immune metaphors
Other areas
Idea!
Idea ‘
Immune System Artificial Immune
Systems
2015/7/17
Y. Tan---Artificial Immune Sys.
20
What is an Artificial Immune
System?
Definition
Dasgupta’99: “Artificial immune systems (AIS) are
intelligent and adaptive systems inspired by the
immune system toward real-world problem solving”
de Castro and Timmis: “Artificial Immune Systems
(AIS) are adaptive systems, inspired by
theoretical immunology and observed immune
functions, principles and models, which are
applied to problem solving”
http://www.cs.kent.ac.uk/people/staff/jt6/aisbook/
•Using natural immune system as a metaphor for solving complex computational problems.
•Not modelling the immune system
2015/7/17
Y. Tan---Artificial Immune Sys.
21
AI models and their
corresponding natural prototypes
Natural prototype Biological level
AI model
Natural language
Left hemisphere
of brain
Formal logic
Formal linguistic
Brain nervous net
Cells
Biological cells
Cells
Neural computing (NC)
Neural networks (NN)
Cellular automata (CA)
Molecules of
proteins
Genetic code
Molecular
2015/7/17
Molecular
Artificial immune
systems (AIS)
Genetic Algorithms
(GA)
Y. Tan---Artificial Immune Sys.
22
Some History
• Developed from the field of theoretical
immunology in the mid 1980’s.
– Suggested we ‘might look’ at the IS
• 1990 – Bersini first use of immune
algorithms to solve problems
• Forrest et al – Computer Security mid
1990’s
• Hunt et al, mid 1990’s – Machine learning
• More……
2015/7/17
Y. Tan---Artificial Immune Sys.
23
AIS’ Scope
•
•
•
•
•
•
•
•
•
•
•
•
•
Pattern recognition;
Fault and anomaly detection;
Data analysis;
Data mining (classification/clustering)
Agent-based systems;
Scheduling;
Machine-learning;
Autonomous navigation and control;
Search and optimization methods;
Artificial life;
Security of information systems;
Optimization;
Just to name a few.
2015/7/17
Y. Tan---Artificial Immune Sys.
24
Back
Typical Applications of AIS
• Computer Security(Forrest’94’96’98, Kephart’94, Lamont’98’01,02,
Dasgupta’99’01, Bentley’00’01,02)
• Anomaly Detection (Dasgupta’96’01’02)
• Fault Diagnosis (Ishida’92’93, Ishiguro’94)
• Data Mining & Retrieval (Hunt’95’96, Timmis’99’01, ’02)
• Pattern Recognition (Forrest’93, Gibert’94, de Castro ’02)
• Adaptive Control (Bersini’91)
• Job shop Scheduling (Hart’98, ’01, ’02)
• Chemical Pattern Recognition (Dasgupta’99)
• Robotics (Ishiguro’96’97,Singh’01)
• Optimization (DeCastro’99,Endo’98, de Castro ’02)
• Web Mining (Nasaroui’02,Secker’05)
• Fault Tolerance (Tyrrell, ’01, ’02, Timmis ’02)
• Autonomous Systems (Varela’92,Ishiguro’96)
• Engineering Design Optimization (Hajela’96 ’98, Nunes’00)
2015/7/17
Y. Tan---Artificial Immune Sys.
25
Basic Immune Models and
Algorithms
•
•
•
•
•
Bone Marrow Models
Negative Selection Algorithms
Clonal Selection Algorithm
Immune Network Models
Somatic Hypermutation
2015/7/17
Y. Tan---Artificial Immune Sys.
26
Bone Marrow Models
• Gene libraries are used to create antibodies
from the bone marrow
• Antibody production through a random
concatenation from gene libraries
• Simple or complex libraries
An individual genome corresponds to four libraries:
Library 1
A1 A2 A3 A4 A5 A6 A7 A8
A3
Library 2
Library 3
B1 B2 B3 B4 B5 B6 B7 B8
Library 4
C1 C2 C3 C4 C5 C6 C7 C8
B2
D1 D2 D3 D4 D5 D6 D7 D8
C8
A3
B2
C8
D5
A3 B2 C8 D5
D5
= four 16 bit segments
= a 64 bit chain
Expressed Ab molecule
2015/7/17
Y. Tan---Artificial Immune Sys.
27
Negative Selection (NS) Algorithms
• Forrest 1994: Idea taken from the negative
selection of T-cells in the thymus
• Applied initially to computer security
• Split into two parts:
– Censoring
– Monitoring
Self
strings (S)
Generate
random strings
(R0)
Detector Set
(R)
Match
No
Detector
Set (R)
Protected
Strings (S)
Yes
Reject
No
Yes
Non-self
Detected
Censoring
2015/7/17
Match
Y. Tan---Artificial Immune Sys.
Monitoring
28
Clonal Selection Algorithm (de
Castro & von Zuben, 2001)
1. Initialisation: Randomly initialise a population (P)
2. Antigenic Presentation: for each pattern in Ag, do:
2.1 Antigenic binding: determine affinity to each P
2.2 Affinity maturation: select n highest affinity from P and
clone and mutate prop. to affinity with Ag, then add new
mutants to P
3. Metadynamics:
3.1 select highest affinity P to form part of M
3.2 replace n number of random new ones
4. Cycle: repeat 2 and 3 until stopping criteria (e.g. Max Generation)
2015/7/17
Y. Tan---Artificial Immune Sys.
29
CLONALG for
PR, Learning,
Optimization
Agj
Ab{d}
Abj*
Ab {r}
Ab {m}
fj
Select
Select
Fj*
Ab {n}
Cj*
Clone
L.N. de Castro, et.al., Learning and
optimization using the clonal selection
principle, IEEE Trans. Evolutionary
computation, vol.6, no.3, June 2002, pp.239251
2015/7/17
Select
Y. Tan---Artificial Immune Sys.
Cj
30
Discrete Immune Network
Models (Timmis & Neal, 2001)
1.
2.
Initialisation: create an initial network from a sub-section of the antigens
Antigenic presentation: for each antigenic pattern, do:
2.1 Clonal selection and network interactions: for each network cell,
determine its stimulation level (based on antigenic and network interaction)
2.2 Metadynamics: eliminate network cells with a low stimulation
2.3 Clonal Expansion: select the most stimulated network cells and
reproduce them proportionally to their stimulation
2.4 Somatic hypermutation: mutate each clone
2.5 Network construction: select mutated clones and integrate
3. Cycle: Repeat step 2 until termination condition is met
2015/7/17
Y. Tan---Artificial Immune Sys.
31
Immune Network Models
• Timmis & Neal, 2000
• Used immune network theory as a basis,
proposed the AINE algorithm
Initialize AIN
For each antigen
Present antigen to each ARB in the AIN
Calculate ARB stimulation level
Allocate B cells to ARBs, based on stimulation level
Remove weakest ARBs (ones that do not hold any B cells)
If termination condition met
exit
else
Clone and mutate remaining ARBs
Integrate new ARBs into AIN
2015/7/17
Y. Tan---Artificial Immune Sys.
32
Immune Network Models
• De Castro & Von Zuben (2000c)
• aiNET, based in similar principles
At each iteration step do
For each antigen do
Determine affinity to all network cells
Select n highest affinity network cells
Clone these n selected cells
Increase the affinity of the cells to antigen by reducing the
distance between them (greedy search)
Calculate improved affinity of these n cells
Re-select a number of improved cells and place into matrix M
Remove cells from M whose affinity is below a set threshold
Calculate cell-cell affinity within the network
Remove cells from network whose affinity is below
a certain threshold
Concatenate original network and M to form new network
Determine whole network inter-cell affinities and remove all those
below the set threshold
Replace r% of worst individuals by novel randomly generated ones
Test stopping criterion
2015/7/17
Y. Tan---Artificial Immune Sys.
33
Back
Somatic Hypermutation
• Mutation rate in proportion to affinity
• Very controlled mutation in the natural immune
system
• Trade-off between the normalized antibody
affinity D* and its mutation rate ,
1
0.9
0.8
0.7
 = 5
0.6

0.5
 = 10
0.4
 = 20
0.3
0.2
0.1
0
0
2015/7/17
0.1
0.2
0.3
0.4
0.5
D*
0.6
Y. Tan---Artificial Immune Sys.
0.7
0.8
0.9
1
34
General Framework of AIS
Solution
Immune Algorithms
Affinity Measures
Representation
Problem
2015/7/17
Application Domain
Y. Tan---Artificial Immune Sys.
35
Representation – Shape Space
• Describe the general shape of a molecule
•Describe interactions between molecules
•Degree of binding between molecules
2015/7/17
Y. Tan---Artificial Immune Sys.
36
Representation
• Vectors
•
•
•
•
Ab = Ab1, Ab2, ..., AbL
Ag = Ag1, Ag2, ..., AgL
Real-valued shape-space
Integer shape-space
Binary shape-space
Symbolic shape-space
2015/7/17
Y. Tan---Artificial Immune Sys.
37
Define their Interaction
• Define the term Affinity
• Affinity is related to distance
– Euclidian
D
L
2
(
Ab

Ag
)
 i
i
i 1
• Other distance measures such as Hamming,
Manhattan etc. etc.
• Affinity Threshold
2015/7/17
Y. Tan---Artificial Immune Sys.
38
Shape Space Formalism
• Repertoire of the
immune system is
complete (Perelson,
Ve
e
Ve
´
2015/7/17
Y. Tan---Artificial Immune Sys.
e
´
1989)
• Extensive regions of
complementarity
• Some threshold of
recognition
V
´
´
´
Ve
e
´
´
39
AIS Design
Back
• Problem description
• Deciding the immune principles used for
problem solving
• Engineering the AIS
–
–
–
–
Defining the types of immune components used
Defining the representation for the elements of the AIS
Applying immune principle to problem solving
The meta-dynamics of an AIS
• Reverse mapping from AIS to the real problem
2015/7/17
Y. Tan---Artificial Immune Sys.
40
Back
Case Studies of AIS
• Malicious Executables Detection --From Z.H. Guo, Z.K. Liu, and Y. Tan, An NNbased Malicious Executables Detection Algorithm
based on Immune Principles, F.Yin, J.Wang, C.
Guo (Eds.): ISNN 2004, Springer, Lecture Notes
in Computer Science 3174, pp. 675-680, 2004.
(http://dblp.uni-trier.de)
• Film Recommender --- From Dr. Dr Uwe
Aickelin (http://www.aickelin.com), University of
Nottingham, U.K. 2004
2015/7/17
Y. Tan---Artificial Immune Sys.
41
New!
Immuneocomputing -- IC
By Tarakanov, A. 2001.
Aims of
• A proper mathematical framework;
• A new kind of computing;
• A new kind of hardware.
New concepts of
formal protein (FP)
------- vs. neuron
formal immune networks (FIN)------- vs. NN
Refer to
2015/7/17
•A.O. Tarakanov, V.A. skormin, and S.P. Sokolova,
Immunocomputing: Principles and Applications, Springer, 2003.
Y. Tan---Artificial Immune Sys.
42
Problems of Traditional Self/Non-self View
•
•
•
•
•
•
No reaction to foreign bacteria in gut (friendly
bacteria…).
No reaction to food / air / etc.
The human body changes over its life.
Auto-immune diseases.
How do we produce antibodies that react against
antigens and yet avoid self?
Is it necessary to attack all non-self or a specific self?
2015/7/17
Y. Tan---Artificial Immune Sys.
43
New!
The Danger Theory
• In the danger model, the idea is to recognise ‘danger’
rather than non self.
• The screening is accomplished post production through
an external ‘danger’ signal. Thus the production of
autoreactive antibodies (which react to self) is allowed.
• If an (e.g. autoreactive) antibody matches a stimulus in
the absence of danger, it is removed. Thus harmless
antigens are tolerated, and changing self accommodated.
Matzinger (2002). The Danger Model: A renewed sense of self , Science 296:
301-304.
2015/7/17
Y. Tan---Artificial Immune Sys.
44
Danger Theory (con’t)
• Danger Theory
– Not self/non-self but Danger/Non-Danger
– Immune response is initiated in the tissues.
Danger Zone.
– This makes it context dependant
• Matzinger (2002) The Danger Model: A renewed sense of self
Science 296: 301-304
• Aickelin & Cayzer (2002) The Danger Theory and Its Application
to Artificial Immune Systems, Proc. International Conference on AIS
(ICARIS 2002)
2015/7/17
Y. Tan---Artificial Immune Sys.
45
Danger Zone
Stimulation
Danger
Zone
Match, but
too far
No match away
Antibodies
Antigens
Cells
Damaged Cell
Danger Signal
2015/7/17
Y. Tan---Artificial Immune Sys.
46
Towards a ‘dangerous’ IDS
“The danger theory suggests that the
immune system reacts to threats based on
the correlation of various (danger) signals,
providing a method of ‘grounding’ the
immune response, i.e. linking it directly to
the attacker.”
Aickelin U, Bentley P, Cayzer S, Kim J and McLeod J (2003): 'Danger
Theory: The Link between AIS and IDS?', Proceedings ICARIS-2003, 2nd
International Conference on Artificial Immune Systems, LNCS 2787, pp
147-155
2015/7/17
Y. Tan---Artificial Immune Sys.
47
Other ways of using danger
Danger = Crime, Antigen = Suspect
or...
Danger = Context ?
It could also be useful for data mining, where the ‘danger’
signal is a proxy measure of interest
‘Danger Zone’ can be spatial or temporal
Andrew Secker, Alex Freitas, and Jon Timmis (2005) “Towards a danger theory inspired
artificial immune system for web mining” in A Scime, editor, Web Mining: applications and
techniques, pages 145-168 (Idea Group)
2015/7/17
Y. Tan---Artificial Immune Sys.
48
Back
Some Recent Applications of
Danger Theory
• Anjum Iqbal, Mohd Aizaini Maarof, “Danger
Theory and Intelligent Data Processing,”
International Journal of Information Technology,
Vol.1, No.1, 2004.
• Andrew Secker, Alex A. Freitas, and Jon Timmis,
“A Danger Thory Inspired Approach to Web
Mining,” Computing Lab. University of Kent,
Canterbury, Kent, UK.2005
• So on.
2015/7/17
Y. Tan---Artificial Immune Sys.
49
The Future
•
•
•
•
More formal approach required?
Wide possible application domains.
What makes the immune system
unique?
More work with immunologists:
– Danger theory.
– Idiotypic Networks.
– Self-Assertion.
2015/7/17
Y. Tan---Artificial Immune Sys.
50
Reference for further reading
Books
• Artificial Immune Systems and Their
Applications by Dipankar Dasgupta (Editor)
Springer Verlag, January 1999.
• L.N. de Castro and J. Timmis, Artificial Immune
Systems: A New Computational Intelligence
Approach, Springer, 2002.
• A.O. Tarakanov, V.A. skormin, and S.P. Sokolova,
Immunocomputing: Principles and Applications,
Springer, 2003.
Related academic papers
• J. Timmis, P.Bentley, and Emma Hart (Eds.): Artificial Immune Systems,
Proceedings of Second International Conference, ICARIS 2003,
Edinburgh, UK, September 2003. LNCS 2787, Springer.
2015/7/17
Y. Tan---Artificial Immune Sys.
51
New Events:
• Special Session on Artificial Immune Systems at the Congress
on Evolutionary Computation (CEC), December 8-12, 2003,
Canberra, Australia.
• Special Session on Immunity-Based Systems at Seventh
International Conference on Knowledge-Based Intelligent
Information & Engineering Systems (KES), September 3-5,
2003, University of Oxford, UK.
• Second International Conference on Artificial Immune Systems
(ICARIS), September 1-3, 2003, Napier University, Edinburgh,
UK.
• Tutorial on Artificial Immune Systems at 1st Multidisciplinary
International Conference on Scheduling: Theory and
Applications (MISTA), 12 August 2003, The University of
Nottingham, UK.
• Tutorial on Immunological Computation at International Joint
Conference on Artificial Intelligence (IJCAI), August 10, 2003,
Acapulco, Mexico.
•
Special Track on Artificial Immune Systems at Genetic and
Evolutionary Computation Conference (GECCO), Chicago, USA,
July 12-16, 2003
2015/7/17
Y. Tan---Artificial Immune Sys.
52
AIS Resources
• Artificial Immune Systems and Their Applications by D
Dasgupta (Editor), Springer Verlag, 1999.
• Artificial Immune Systems: A New Computational
Intelligence Approach by L de Castro, J Timmis, Springer
Verlag, 2002.
• Immunocomputing: Principles and Applications by A
Tarakanov et al, Springer Verlag, 2003.
• Third International Conference on Artificial Immune Systems
(ICARIS), September 13-16, 2004, University of Catania, Italy.
• 4th International Conference on Artificial Immune
Systems(ICARIS), 14th-17th August, 2005 in Banff,
Alberta, Canada
2015/7/17
Y. Tan---Artificial Immune Sys.
53
First Page
That’s all
2015/7/17
Y. Tan---Artificial Immune Sys.
54
Case Study 1:
Malicious Executables Detection
based on Artificial Immune Principles*
From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based Malicious
Executables Detection Algorithm based on Immune Principles, F.Yin,
J.Wang, C. Guo (Eds.): ISNN 2004, Springer Lecture Notes on
Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-
trier.de)
* This work was supported by Natural Science Foundation
of China with Grant No. 60273100.
2015/7/17
Y. Tan---Artificial Immune Sys.
55
tens of thousands of
new viruses / year
Appear!
But: Current antivirus systems
attempt to detect these new
malicious programs with
heuristics by hand (costly
and ineffective)
Dos/Win32 viruses
Computers / Information Systems
Trojan horses
Worms
eMail attached viruses
Current Task:
Devise new methods
for detecting new ME
Malicious executables
2015/7/17
Y. Tan---Artificial Immune Sys.
58
Back
Previous Related Works
• Signature-based Methods
• Expert Knowledge-based Methods
• Machine Learning Methods
2015/7/17
Y. Tan---Artificial Immune Sys.
61
Back
Signature-based Methods
It creates a unique tag for each malicious program so that future
examples of it can be correctly classified with a small error rate.
And relies on signatures of known malicious executable to generate
detection models.
Drawbacks:
• Can not detect unknown and mutated viruses.
• As increase of the number and type of viruses, its detection speed
become slow dramatically. At the same time, the analysis of the
signatures of viruses become very difficult, in particular, for the
encrypted signatures.
(refer to IBM Anti-virus Group’s report: R.W. Lo, K.N. Levitt, and R.A.
Olsson. MCF: a Malicious Code Filter. Computers & Security,
14(6):541–566., 1995.)
2015/7/17
Y. Tan---Artificial Immune Sys.
62
Expert Knowledge-based
Methods
Back
Using the knowledge of a group of virus
experts to construct heuristic classifiers
for detection of unknown viruses.
Drawbacks:
• Time-consuming analysis method.
• Only discover some unknown viruses, but its false
detection rate is very high.
For detecting unknown virus based on ANN, IBM Anti-virus
Group also proposes one method to detect Boot Sector
viruses only.
(refer to W. Arnold and G. Tesauro. Automatically Generated Win32 Heuristic
Virus Detection. Proceedings of the 2000 International Virus Bulletin
Conference, 2000.)
2015/7/17
Y. Tan---Artificial Immune Sys.
63
Back
Machine Learning Methods
• M.G. Schultz developed a framework that used
data mining algorithms, i.e., Multi-Naïve Bayes
method, to train multiple classifiers on a set of
malicious and benign executables to detect new
examples (unknown ME).
(refer to M.G. Schultz.,E. Eskin and E. Zadok . Data Mining Methods for
Detection of New Malicious Executables. IEEE Symposium on Security
and Privacy, May 2001.)
2015/7/17
Y. Tan---Artificial Immune Sys.
64
Biologically-motivated Information
Processing Systems
• Brain-nervous systems – Neural Networks (NN)
• Genetic systems – Genetic Algorithms(GA)
• Immune systems – Artificial Immune Systems(AIS)
or immunological computation.
NN and GA have extensively studied with wide
applications but AIS has relative few applications
2015/7/17
Y. Tan---Artificial Immune Sys.
65
Natural prototypes vs. their models
Natural
prototype
Biological
level
Natural language Left
hemisphere of
brain
Brain nervous
Cells
net
Biological cells
Cells
Molecules of
proteins
Genetic code
2015/7/17
Molecular
Molecular
Computing model
Formal logic
Formal linguistic
Artificial Neural
networks (ANN)
Cellular automata
(CA)
Artificial immune
systems (AIS)
Genetic Algorithms
(GA)
Y. Tan---Artificial Immune Sys.
66
Comparison of Three Algorithms
GA (Optimisation)
NN (Classification)
AIS
Components
Chromosome Strings
Artificial Neurons
Attribute Strings
Location of
Components
Dynamic
Pre-Defined
Dynamic
Structure
Discrete Components
Networked Components
Discrete components /
Networked Components
Knowledge Storage
Chromosome Strings
Connection Strengths
Component
Concentration / Network
Connections
Dynamics
Evolution
Learning
Evolution / Learning
Meta-Dynamics
Recruitment / Elimination
of Components
Construction / Pruning of
Connections
Recruitment / Elimination
of Components
Interaction between
Components
Crossover
Network Connections
Recognition / Network
Connections
Interaction with
Environment
Fitness Function
External Stimuli
Recognition / Objective
Function
2015/7/17
Y. Tan---Artificial Immune Sys.
67
Back
Immune Principles for Malicious
Executable Detection
• Non-self Detection Principle
• Anomaly Detection Based on Thickness
• The Diversity of Detector Representation
vs. Anomaly Detection Hole
2015/7/17
Y. Tan---Artificial Immune Sys.
68
Non-self Detection Principle
• For natural immune system, all cells of body are
categorized as two types of self and non-self. The
immune process is to detect non-self from cells.
• To realize the non-self detection, the maturation
process of lymphocytes T cell undergoes two
selection stages of Positive Selection and Negative
Selection since antigenic encounters may result in
cell death. Some computer scientists inspired by
these two stages had proposed some algorithms
used to detect anomaly information. Here, we will
use the Positive Selection Algorithm (PSA) to
perform the non-self detection for recognizing the
malicious executable.
2015/7/17
Y. Tan---Artificial Immune Sys.
69
Back
Non-self Detection by PSA
Detector Set Dl
Short sequence to
be detected
（Its length is l）
N
Match ?
？
Y
self
non-self
Process of anomaly detection with PSA
2015/7/17
Y. Tan---Artificial Immune Sys.
70
Back
Anomaly Detection Based on
Thickness
• Anomaly recognition process is one
process that immune cells detect
antigens and are activated.
• The activated threshold of immune cells
is decided by the thickness of immune
cells matching antigens.
2015/7/17
Y. Tan---Artificial Immune Sys.
71
The Diversity of Detector Representation
vs. Anomaly Detection Hole
• The main difficulty of anomaly detection is utmost
decreasing the anomaly detection hole. The natural
immune system resolves this problem well by use of the
diversity of MHC (Major Histocompatibility Complex) cell
representations, which decides the diversity of anti-body
touched in surface of T cells. This property is very useful
in increasing the power of detecting mutated antigens,
and decreasing the anomaly detection hole.
• According to the principle, we can use the diversity of
detector representation to decrease the anomaly
detection hole. As was illustrated by following schematic
drawings.
2015/7/17
Y. Tan---Artificial Immune Sys.
72
Schematic diagram of abnormal
detection holes (cont’)
Self Space
Abnormal
detection holes
Nonself Space
Detectors
2015/7/17
Y. Tan---Artificial Immune Sys.
73
back
Reduction of abnormal detection
holes by use of the diversity of
detector representations
Detector
Representation 1
Detector
Representation 2
Detector
Representation 3
Combination of detectors
2015/7/17
Y. Tan---Artificial Immune Sys.
74
Malicious Executable Detection
Algorithm (MEDA)
MEDA based on AIS includes three
parts,
• Detector generation,
• Anomaly information extraction ,
• and Classification.
2015/7/17
Y. Tan---Artificial Immune Sys.
75
Back
Flow Chart of Malicious Executable
Detection Algorithm (MEDA)
Gene
(…01101001…)
Generating detector set
Extracting
property
Update Gene
(…10101101…)
2015/7/17
anomaly
Executable to be detected
(…00111101…)
Y. Tan---Artificial Immune Sys.
MEDA
Classifier
Output
76
Generation of Detector Set
Detector generation algorithm:
• Begin initialize lstep、ld、k=0
•
Do cutting e(fk,n) from Eg(b)
•
i=0；
•
While i <= n-ld-1 do
•
Begin
•
d = Seq(e(fk,n),i, ld)；
•
if d≠€ Dld then Dld←d；
•
i=i+lstep；
•
End
•
k=k+1；
•
Until Eg(b) is empty
•
Return Dld ；
• End
2015/7/17
Y. Tan---Artificial Immune Sys.
77
Back
Illustration of Detector Generating
Process
File Hex Sequence: 56 32 12 0A 34 ED FF 00 2D…. . 00 0A 34 ED FF FA 11 00
Extracting Detector: 56 32 12
32 12 0A
12 0A 34
┋……………………………………………┋
FF FA 11
FA 11 00
Generating Process of 24-bit Detectors with 8-bit stepsize (ld=24, lstep=8)
2015/7/17
Y. Tan---Artificial Immune Sys.
78
Extraction of Anomaly Characteristics -Non-self Thickness (NST)
• Non-self Detection
• NST, as Anomaly Property, is defined
as the ratio of number of non-self units
to file binary sequence, pl=nn/(nn+ns).
• If there are m kinds of detectors, the file
has a NST Vector P=(pl1, pl2, …, plm)T.
2015/7/17
Y. Tan---Artificial Immune Sys.
79
NST Extraction Diagram
Initialization，choose lstep、ld , Dl
“Nonself” Detection
File to be detected
（…00111101）
Y
Is “Nonself” ?
N
ns add 1
N
nn add 1
Completing
Y detection ？
Compute pl=nn/(nn+ns)
End
2015/7/17
Y. Tan---Artificial Immune Sys.
80
Back
NST Extraction Algorithm
• Begin open e(fk,n);
•
Select lstep, ld;
•
Set ns=0, nn=0, i=0;
•
While i <= n-ld-1 do
•
Begin
•
s = Seq(e(fk,n),i, ld)；
•
if s ≠€ Dld then nn = nn+1；
•
else ns = ns + 1；
•
i = i + lstep；
•
End
•
pld = nn / ( ns＋nn )；
•
Return pld ；
• End
2015/7/17
Y. Tan---Artificial Immune Sys.
81
BP Network Classifier
• We use Anomaly Property Vector
(APV), i.e., NST vector P, as input
variable of two-layer BP network
classifier. The number of nodes of
input layer equals to APV’s
dimension.
• The Sigmoid transfer function is
chosen for the hidden layer and
Linear function for the output layer.
2015/7/17
Y. Tan---Artificial Immune Sys.
82
Back
BP Network Classifier Structure
Non-Self Thickness (NST) Vector
2015/7/17
pl1
pl2
Out (1-ME, 0-BE)
P
plm
Y. Tan---Artificial Immune Sys.
83
Back
Experiments and Discussion
• Experimental Data Set
• Generation of Detector Set
• Experimental Result Using Single Detector Set
• Experimental Result Using Multi-Detector Set
2015/7/17
Y. Tan---Artificial Immune Sys.
84
Back
Experimental Data Set
Type
Files
Remarks
BE
915
Win 2K OS and some
application programs.
ME
3566 DOS virus, Win32 virus, Trojan,
Total
4481
Worm, etc. from Internet.
All Justified by Antivirus
cleaner Tools
• BE—Benign Executable
• ME—Malicious Executable
2015/7/17
Y. Tan---Artificial Immune Sys.
85
Back
Generation of Detector Set
• Eg(b) is Gene of generating detector, ld ∈{16，
24，32，64，96}, and lstep=8bits. By using
the detector generating algorithm, we can get
D16, D24, D32, D64, and D96, separately.
Table1: Detectors generation result
Code Length ld
16
24
|Dld|
65536
10,931,62 8,938,35
7
2
store
structure
Bitmap Bitmap
Index
Index
2015/7/17
32
Tree
Y. Tan---Artificial Immune Sys.
64
96
12,768,36 21,294,85
1
7
Tree
Tree
86
异己”浓度P24
NST p24
正确检测率(Detection Rate)%
Detection Result of Malicious
Executables by D24
File No.
(a) NST of files, where symbol
‘x’ represents benign program (Red),
‘□’ malicious program (Blue)
2015/7/17
误判率(False Positive Rate)%
Y. Tan---Artificial Immune Sys.
(b) ROC Curve
87
异己”浓度P32
NST p32
正确检测率(Detection Rate)%
Detection Result of Malicious
Executables by D32
文件序号
误判率(False Positive Rate)%
(a) NST of files, where symbol
‘x’ represents benign program,
‘□’ malicious program
2015/7/17
Y. Tan---Artificial Immune Sys.
(b) ROC Curve
88
异己”浓度P64
NST p64
正确检测率(Detection Rate)%
Detection Result of Malicious
Executables by D64
文件序号
(a) NST of
files, where symbol
‘x’ represents benign program (Red),
‘□’ malicious program (Blue)
2015/7/17
误判率(False Positive Rate)%
Y. Tan---Artificial Immune Sys.
(b) ROC Curve
89
Experimental Result Using Single
Detector Set
Detection Rate (%)
100
80
60
16bits
24bits
32bits
64bits
96bits
40
20
Data Set
Data Set
Data Set
Data Set
Data Set
0
0
20
40
60
80
100
False Positive Rate (%)
2015/7/17
Y. Tan---Artificial Immune Sys.
90
When FPR is fixed, relationship
curves of DR versus Code
Length ld
Detection Rate (%)
Back
Code length ld（bits）
2015/7/17
Note: from theY.bottom
to up,
the FPR
Tan---Artificial
Immune
Sys. is 0%, 0.5%, 1%, 2%,
4%, 8%, and 16%, in sequence.
91
Experimental Result Using MultiDetector Set
• This experiment selects multi-detector set to detect
benign and malicious executables.
• We don’t use D16 because of its zero DR and also set
D96 as upper limit because almost same DR values
when ld ≥96.
• Here we selects D24, D32, D64 and D96 four detector
sets as anomaly detection data set, and uses them to
extract Non-self thickness (NST) vector, and finally a
BP network is exploited as classifier.
• For the process of classification, we randomly selects
30% files of E(b) as Eg(b) to train a BP network, and
use the remaining data to illustrate the anomaly
detection performance.
2015/7/17
Y. Tan---Artificial Immune Sys.
92
“异
Detection Rate (%)
“异己”浓度（64bits）
NST Distribution and ROC Curve of
Multi-Detector Set Method
己”
浓度
（2
4bi
ts）
”浓度
“ 异己
ts）
（32bi
False Positive Rate (%)
(a) NST of files for mixture of D24,
D32 and D64.
‘x’ benign program (in Red),
‘□’ malicious program (in Blue).
2015/7/17
(b) ROC Curve of mixed detector
set of D24, D32, D64 and D96
Y. Tan---Artificial Immune Sys.
93
Comparisons With Bayes Methods
and Signature-based Method
100
Detection Rate (%)
80
MEDA with BP Network
Naive Bayes with Strings
Multi-Naive Bayes with Bytes
Signature Method
60
40
20
0
0
2
4
6
8
10
12
False Positive Rate (%)
2015/7/17
Y. Tan---Artificial Immune Sys.
94
Back
Algorithm Complexities
Operation type 1
Operation type 2
Operation type 3
Name
Amount
Name
Amount
Name
Amount
MEDA
detectors
ltrain
detector
matching
≤80×ltest
Computing
NST
4×lf
additions
Bayes
Prob.
Info.
>>ltrain
Searching
P(Fi/C)
Depend
Computing lf float
on P(Fi/C) Joint Probs. multiplicaP(C ) P( F / C )
tions
Algorithm
Y. Tan---Artificial Immune Sys.
0.4Gb
1Gb
n
i 1
2015/7/17
Store
Space
i
95
Back
Remarks
• For short binary sequence and single detector
set for the detection of malicious executables,
the performance of D24 is the best, giving out
DR 80.6% with FPR 3%.
• For long code length of detector and multidetector set, our method obtains the best
performance of DR 97.46% with FPR 2%, over
current methods.
• This result verifies
– diversity of detector representation can decrease
anomaly detection holes.
– “non-self” thickness detection.
2015/7/17
Y. Tan---Artificial Immune Sys.
Back
96
Case Study 2:
Film Recommender
From Dr. Dr Uwe Aickelin (http://www.aickelin.com)
University of Nottingham, U.K.,

Prediction:
– What rating would I give a specific film?

Recommendation:
– Give me a ‘top 10’ list of films I might like.
2015/7/17
Y. Tan---Artificial Immune Sys.
97
Film Recommender (con’t 1)
•
•
•
•
•
•
•
EachMovie database (70k users).
User Profile: set of tuples {movie, rating}.
Me: My user profile.
Neighbour: User profile of others.
Similarity metric: Correlation score.
Neighbourhood: Group of similar users.
Recommendations: From neighbourhood.
2015/7/17
Y. Tan---Artificial Immune Sys.
98
Film Recommender (con’t 2)
Antigen
Antibody
•
•
•
•
User Profile: set of tuples {movie, rating}
Me: My user profile.
Neighbour: User profile of others.
Affinity metric: Correlation score.
Antibody – Antigen Binding
Antibody – Antibody Binding
• Neighbourhood: Group of similar users.
Group of antibodies similar to antigen and dissimilar to other antibodies
• Recommendations: From neighbourhood
Weighted Score based on Similarities.
2015/7/17
Y. Tan---Artificial Immune Sys.
99
Film Recommender (con’t 3)
• Start with empty AIS.
• Encode target user as an antigen Ag.
• WHILE (AIS not full) && (More Users):
– Add next user as antibody Ab.
– IF (AIS at full size) Iterate AIS.
• Generate recommendations from AIS.
2015/7/17
Y. Tan---Artificial Immune Sys.
100
Film Recommender (con’t 4)
Suppose we have 5 users and 4 movies:
–
–
–
–
–
u1={(m1,v11),(m2,v12),(m3,v13)}.
u2={(m1,v21),(m2,v22),(m3,v23),(m4,v24)}.
u3={(m1,v31),(m2,v32),(m4,v34)}.
u4={(m1,v41),(m4,v44)}.
u5={(m1,v51),(m2,v52),(m3,v53), (m4,v54)}.
• We do not have users’ votes for every film.
• We want to predict the vote of user u4 on movie
m3.
2015/7/17
Y. Tan---Artificial Immune Sys.
101
Algorithm walkthrough (1)
AIS
Start with empty AIS:
DATABASE
u1, u2, u3, u4, u5
User for whom to predict becomes
antigen:
DATABASE
u4
u1, u2, u3, u5
2015/7/17
AIS
Y. Tan---Artificial Immune Sys.
Ag
102
Algorithm walkthrough (2)
Add antibodies until AIS is full…
AIS
DATABASE
u1
Ag
u2, u3, u5
Ab1
AIS
DATABASE
u2,u3
Ag
Ab1
u4
Ab2
Ab3
2015/7/17
Y. Tan---Artificial Immune Sys.
103
Algorithm walkthrough (3)
• Table of Correlation between Ab
and Ag:
Ab3
Ab1
Ag
Ab2
2015/7/17
– MS14, MS24, MS34.
• Table of Correlation between
Antibodies:
– MS12 = CorrelCoef(Ab1, Ab2)
– MS13 = CorrelCoef(Ab1, Ab3)
– MS23 = CorrelCoef(Ab2, Ab3)
Y. Tan---Artificial Immune Sys.
104
Algorithm walkthrough (4)
• Calculate Concentration of each Ab:
– Interaction with Ag (Stimulation).
– Interaction with other Ab (Suppression).
AIS
AIS
Ag
Ag
Ab1 Ab2
Ab3
2015/7/17
Ab1
Ab2
Ab2 Ab
2
Ab1
Ab2
Ab2
Y. Tan---Artificial Immune Sys.
105
Algorithm walkthrough (5)
• Generate Recommendation based on
Antibody Concentration.
AIS
Ag
Ab2
Ab1
Ab2 Ab
2
Ab1
Ab2
Ab2
2015/7/17
Recommendation for
user u4 on movie m3
will be highly based on
vote on m3 of user u2
Y. Tan---Artificial Immune Sys.
106
Film Recommender Results
• Tested against standard method (Pearson
k-nearest neighbours).
• Prediction:
– Results of same quality.
• Recommendation:
– 4 out of 5 films correct (AIS).
– 3 out of 5 films correct (Pearson).
Back
2015/7/17
Y. Tan---Artificial Immune Sys.
107

机器学习研究及最新进展

Transcript 机器学习研究及最新进展

Directory