CS519 BGP Project Report Kai-Wen Chung (kc279) San-Yiu Cheng (sc345) How to Proceed BGP Analysis Collect Raw Data Import into Database Query Database and Analyze data.

Download Report

Transcript CS519 BGP Project Report Kai-Wen Chung (kc279) San-Yiu Cheng (sc345) How to Proceed BGP Analysis Collect Raw Data Import into Database Query Database and Analyze data.

CS519 BGP Project Report
Kai-Wen Chung (kc279)
San-Yiu Cheng (sc345)
How to Proceed BGP Analysis
Collect Raw Data
Import into Database
Query Database and
Analyze data
Collect Raw Data


MAE-EAST (1998.1 ~ 1998.11)
http://archive.routeviews.org/ (2003.1 ~
2003.3)
Database Schema

Original Schema
DataSet
Name
DID
FromFile
Descr
ImportDate
CollectDate
Type
int
varchar(255)
varchar(512)
DateTime
varchar(32)
Message
Name
DID
MsgID
MsgType
MsgTime
PeerIP
PeerAS
Prefix
PrefixMask
Origin
NextHop
LocalPref
Med
Community
Aggregate
AggregateIP
Type
int
int
varchar(1)
DateTime
bigint
int
bigint
int
varchar(16)
bigint
int
int
varchar(8)
varchar(8)
bigint
MsgPath
Field Name
DID
MsgID
PathOrder
ASID
Alternative
Type
int
int
tinyint
int
BIT
Database Schema (cont.)

Record Size



Message: 94 bytes/record
MsgPath: 18 bytes/record
# Record


Message: 104,841,405 (98.1 ~ 98.11)
MsgPath: 251,442,478 (98.1 ~ 98.11)
Database Schema (cont.)



Database space allocation: 20GB
About 12 hours to import raw data for 1
month (about 10,000,000 messages,
and 20,000,000 paths)
Data volume reaches limitation soon
Our Solution

Allocate larger space



Move Database from SQLServer ->
Sparrow
Total 70GB
Modify data schema to reduced record
size
Data Schema Modification
DataSet
Name
DID
FromFile
Descr
ImportDate
CollectDate
Type
int
varchar(255)
varchar(512)
DateTime
varchar(32)
OriginTable
Field Name Type
id
tinyint
origin
varchar(16)
AggrTable
Field Name Type
id
tinyint
aggr
varchar(16)
Message
Name
DID
MsgID
MsgType
MsgTime
PeerIP
PeerAS
Prefix
PrefixMask
Origin
NextHop
LocalPref
Med
Community
Aggregate
AggregateIP
Type
int
int
varchar(1)
DateTime
int
int
int
tinyint
tinyint
int
int
int
varchar(8)
tinyint
int
MsgPath
Field Name
DID
MsgID
PathOrder
ASID
Alternative
IPTable
Field Name
id
ipval
ip
Type
int
int
tinyint
int
BIT
Type
int
bigint
varchar(16)
Data Schema Modification

Record Size



Size Reduces



Message: 52 bytes/record
MsgPath: 14 bytes/record
Message: 46.9%
MsgPath: 22.2%
Faster Data Importing
Current Status

Database


P3-500 with 128MB ram, and Windows
2000 Server and SQL Server 2000
installed
Imported Data


1998.1 ~ 1998.11. About 21GB in DB
2003.3. About 34GB in DB
Current Database Issue

SQL Server Performance


It could take several hours to run a query
Space problem


70GB is only enough for data of 1 ~ 2
month (of 2003)
We need a “Tera-byte” Database to
accommodate all data of 2002, and 2003
Summary of Data

Total space used:


Number of Messages:


~55G (1998 and 03/2003)
~220.5 Million (1998 and 03/2003)
Number of DataSet:

~30,000 (1998 and 03/2003)
Summary of Data (cont.)

A small number of IP addresses dominate
the routing table


15 Source IP addresses occupy about 68% of
the PeerIp field of the Messages
15 Destination IP Addresses occupy about
47% of the NextHop field of the Messages
Summary of Data (cont.)

Advertisement Vs. Withdrawal Messages



There are about 220 Million Messages
~31.5% of all Messages are Withdrawal
Messages
~68.5% of all Messages are Advertisement
Messages
Data Analysis
Fluctuation by Month, 1998
16,000,000.00
Number of Messages
14,000,000.00
12,000,000.00
10,000,000.00
Series1
8,000,000.00
6,000,000.00
4,000,000.00
2,000,000.00
0.00
1
2
3
4
5
6
Month
7
9
10
11
Data Analysis (cont.)
Fluctuation By Day, 1998
1,800,000
1,600,000
1,200,000
1,000,000
800,000
600,000
400,000
200,000
Date
1120
1110
1031
1021
1011
1001
921
911
901
722
712
702
622
612
602
523
513
503
423
413
403
322
312
302
220
210
131
121
111
-
101
Number of Messages
1,400,000
Data Analysis (cont.)
Fluctuation By Day, 03/2003
8,000,000
7,000,000
Number of Messages
6,000,000
5,000,000
4,000,000
3,000,000
2,000,000
1,000,000
0
301 302 303
304 305 306
307 308 309
310 311
Date
312 313 314
315 316 317
318 319 320
Data Analysis (cont.)
Fluctuation By Hour, 1998
6,000,000
Number of Messages
5,000,000
4,000,000
3,000,000
2,000,000
1,000,000
0
0
1
2
3
4
5
6
7
8
9
10 11
12 13
Hour
14 15 16
17 18
19 20 21
22 23
Data Analysis (cont.)
Fluctuation By Hour, 03/2003
7,000,000
Number of Messages
6,000,000
5,000,000
4,000,000
3,000,000
2,000,000
1,000,000
0
0
1
2
3
4
5
6
7
8
9
10
11
12
Hour
13
14
15
16
17
18
19
20
21
22
23
Some Advices

Optimize your query


Test on bgpbaby first


Some queries are going to take several hours to
execute
This is a smaller version of bgpdata (~1G)
Don’t try to execute all your queries on last day

The SQL Server database is going to be
overwhelmed