CS519 BGP Project Report Kai-Wen Chung (kc279) San-Yiu Cheng (sc345) How to Proceed BGP Analysis Collect Raw Data Import into Database Query Database and Analyze data.
Download ReportTranscript CS519 BGP Project Report Kai-Wen Chung (kc279) San-Yiu Cheng (sc345) How to Proceed BGP Analysis Collect Raw Data Import into Database Query Database and Analyze data.
CS519 BGP Project Report Kai-Wen Chung (kc279) San-Yiu Cheng (sc345) How to Proceed BGP Analysis Collect Raw Data Import into Database Query Database and Analyze data Collect Raw Data MAE-EAST (1998.1 ~ 1998.11) http://archive.routeviews.org/ (2003.1 ~ 2003.3) Database Schema Original Schema DataSet Name DID FromFile Descr ImportDate CollectDate Type int varchar(255) varchar(512) DateTime varchar(32) Message Name DID MsgID MsgType MsgTime PeerIP PeerAS Prefix PrefixMask Origin NextHop LocalPref Med Community Aggregate AggregateIP Type int int varchar(1) DateTime bigint int bigint int varchar(16) bigint int int varchar(8) varchar(8) bigint MsgPath Field Name DID MsgID PathOrder ASID Alternative Type int int tinyint int BIT Database Schema (cont.) Record Size Message: 94 bytes/record MsgPath: 18 bytes/record # Record Message: 104,841,405 (98.1 ~ 98.11) MsgPath: 251,442,478 (98.1 ~ 98.11) Database Schema (cont.) Database space allocation: 20GB About 12 hours to import raw data for 1 month (about 10,000,000 messages, and 20,000,000 paths) Data volume reaches limitation soon Our Solution Allocate larger space Move Database from SQLServer -> Sparrow Total 70GB Modify data schema to reduced record size Data Schema Modification DataSet Name DID FromFile Descr ImportDate CollectDate Type int varchar(255) varchar(512) DateTime varchar(32) OriginTable Field Name Type id tinyint origin varchar(16) AggrTable Field Name Type id tinyint aggr varchar(16) Message Name DID MsgID MsgType MsgTime PeerIP PeerAS Prefix PrefixMask Origin NextHop LocalPref Med Community Aggregate AggregateIP Type int int varchar(1) DateTime int int int tinyint tinyint int int int varchar(8) tinyint int MsgPath Field Name DID MsgID PathOrder ASID Alternative IPTable Field Name id ipval ip Type int int tinyint int BIT Type int bigint varchar(16) Data Schema Modification Record Size Size Reduces Message: 52 bytes/record MsgPath: 14 bytes/record Message: 46.9% MsgPath: 22.2% Faster Data Importing Current Status Database P3-500 with 128MB ram, and Windows 2000 Server and SQL Server 2000 installed Imported Data 1998.1 ~ 1998.11. About 21GB in DB 2003.3. About 34GB in DB Current Database Issue SQL Server Performance It could take several hours to run a query Space problem 70GB is only enough for data of 1 ~ 2 month (of 2003) We need a “Tera-byte” Database to accommodate all data of 2002, and 2003 Summary of Data Total space used: Number of Messages: ~55G (1998 and 03/2003) ~220.5 Million (1998 and 03/2003) Number of DataSet: ~30,000 (1998 and 03/2003) Summary of Data (cont.) A small number of IP addresses dominate the routing table 15 Source IP addresses occupy about 68% of the PeerIp field of the Messages 15 Destination IP Addresses occupy about 47% of the NextHop field of the Messages Summary of Data (cont.) Advertisement Vs. Withdrawal Messages There are about 220 Million Messages ~31.5% of all Messages are Withdrawal Messages ~68.5% of all Messages are Advertisement Messages Data Analysis Fluctuation by Month, 1998 16,000,000.00 Number of Messages 14,000,000.00 12,000,000.00 10,000,000.00 Series1 8,000,000.00 6,000,000.00 4,000,000.00 2,000,000.00 0.00 1 2 3 4 5 6 Month 7 9 10 11 Data Analysis (cont.) Fluctuation By Day, 1998 1,800,000 1,600,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 Date 1120 1110 1031 1021 1011 1001 921 911 901 722 712 702 622 612 602 523 513 503 423 413 403 322 312 302 220 210 131 121 111 - 101 Number of Messages 1,400,000 Data Analysis (cont.) Fluctuation By Day, 03/2003 8,000,000 7,000,000 Number of Messages 6,000,000 5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 0 301 302 303 304 305 306 307 308 309 310 311 Date 312 313 314 315 316 317 318 319 320 Data Analysis (cont.) Fluctuation By Hour, 1998 6,000,000 Number of Messages 5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Hour 14 15 16 17 18 19 20 21 22 23 Data Analysis (cont.) Fluctuation By Hour, 03/2003 7,000,000 Number of Messages 6,000,000 5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Hour 13 14 15 16 17 18 19 20 21 22 23 Some Advices Optimize your query Test on bgpbaby first Some queries are going to take several hours to execute This is a smaller version of bgpdata (~1G) Don’t try to execute all your queries on last day The SQL Server database is going to be overwhelmed