Transcript NOC Tools
NOC Tools Donal O’Cearbhaill HEAnet Ltd. • Ireland’s National Education and Research Network • Provides Internet services to Irish Universities • 2005 - Broadband for Schools Broadband for Schools • Free ‘always on’ broadband connectivity to Schools • 3 Year Agreement – Dept of Education/Dept of Communication/TIF • 3,925+ Schools • 7 Access Providers • HEAnet backbone network • Onward connectivity to Internet & Educational Networks • HEAnet Managed Services: Network; Security; E-Mail Challenges • • • • 4,000 schools Highly contended links A lot of satellite connections SLA/Contract enforcement Installation Rate Monitoring/ISP Infrastructure • 28 Debian/Ubuntu servers • 4 Fibrenetix disk arrays – Disk based backup – rsync & application level dumps – Syslog nodes • PostgreSQL database • Aggregation Routers – 7301 – PPPoE – GRE • Border/Services Routers – 6500, 3750 Tools • • • • • SmokePing Nagios Rancid Cacti Netflow SmokePing • • • • Latency measurement tool Runs probes in parallel >3,800 hosts RRD backend – Reporting • Historical view • Acceptance testing • Tuning – FPing timeouts decreased – Total number of probes reduced – Satellite frequency reduced Nagios • 4,131 services on 3,905 hosts • Top 5 number of hosts on nagios.org • Populated by SmokePing and memcache – Nagios runs checks serially – >1 hour vs. 15 mins • Nagios populates – sidebar alarms – Schools Up Graph Rancid • Really Awesome New Cisco confIg Differ • 3,296 Router configs • Maintains history of changes – Mails changes Cacti • 3,900 hosts • Data gathering – SNMP – External Perl scripts • Graph templating • Database driven • Cricket: 27 mins – Perl • Cacti: <5 mins – Cactid – Custom multithreaded C application Cacti Weathermap Interconnects Netflow • NfSen is a graphical web based front end for the nfdump netflow tools • Query abuse reports • Usage reporting Reporting • Daily Reports • DNS log reporting • Report infected PCs – Top MX lookups – Misconfigurations – Active Directory • Netflow – IPs – Schools usage Gigabytes downloaded by schools on 22/03/07: 332 Gigabytes uploaded by schools on 22/03/07 : 48 Total MegaBytes downloaded for Digiweb Satellite: 12834 Total MegaBytes uploaded for Digiweb Satellite: 1202 Total MegaBytes downloaded for Digiweb Wireless: 77578 Total MegaBytes uploaded for Digiweb Wireless: 10217 Total MegaBytes downloaded for ESATBT ADSL: 54352 Total MegaBytes uploaded for ESATBT ADSL: 6632 Total MegaBytes downloaded for HSData Wireless: 3047 Total MegaBytes uploaded for HSData Wireless: 575 ….. Logging • Syslog server per PoP – Servers – Routers • Logcheck – Logfile scanner • IP to school identifier – Mapping IP to school Server Monitoring • SSH keys – Sharing keys/fingerprints – High overhead • SNMP – Less configurable • Memcache – Local Perl script – Easy to rollout – Load – Disk Space – Monitor Processes Memcache • Distributed memory caching system • Low overhead • Speed up dynamic database-driven websites by caching data and objects in memory • Developed for LiveJournal – Slashdot – Wikipedia – SourceForge • Schools – Nagios – Maps – Server status Subversion • Modern replacement for CVS • Provisioning System – Configs • ViewCVS • Checkins get mailed • Schools-noc – Scripts stored on every server – Automatically updated – cron.d Sidebar • • • • Nagios polled every minute Populated into memcache Sidebar alarms Pubcookie single sign-on Provisioning System • Services provisioned – – – – – – – – – – CPE router config Nagios RADIUS Cacti Cisco ACS (TACACS+) SmokePing Fortigate (Content filtering) Maps DNS Webhosting Provisioning System • • • • • • Text::Template templating system Data stored in authoritative database PostgreSQL’s INET type is brilliant! Perl scripts generate configlets Added to Subversion Perl/Shell provisioning agents handle service restarts etc. • Ability to stop all provisioning Provisioning System Structure Postgresql Perl subversion Shell scripts Database Configuration generator Revision control Provisioning agents Google Maps Random things we’ve encountered • Predictable traffic levels • Smokeping, Nagios and Cricket/Cacti take a lot of tuning to monitor our network • Difficult to achieve high bandwidth and high level of reliability in transparent content filter [email protected]