Transcript NOC Tools

NOC Tools
Donal O’Cearbhaill
HEAnet Ltd.
• Ireland’s National Education and Research Network
• Provides Internet services to Irish Universities
• 2005 - Broadband for Schools
Broadband for Schools
• Free ‘always on’ broadband connectivity to Schools
• 3 Year Agreement
– Dept of Education/Dept of Communication/TIF
• 3,925+ Schools
• 7 Access Providers
• HEAnet backbone network
• Onward connectivity to Internet & Educational Networks
• HEAnet Managed Services: Network; Security; E-Mail
Challenges
•
•
•
•
4,000 schools
Highly contended links
A lot of satellite connections
SLA/Contract enforcement
Installation Rate
Monitoring/ISP Infrastructure
• 28 Debian/Ubuntu
servers
• 4 Fibrenetix disk
arrays
– Disk based backup
– rsync & application
level dumps
– Syslog nodes
• PostgreSQL database
• Aggregation Routers
– 7301
– PPPoE
– GRE
• Border/Services Routers
– 6500, 3750
Tools
•
•
•
•
•
SmokePing
Nagios
Rancid
Cacti
Netflow
SmokePing
•
•
•
•
Latency measurement tool
Runs probes in parallel
>3,800 hosts
RRD backend
– Reporting
• Historical view
• Acceptance testing
• Tuning
– FPing timeouts decreased
– Total number of probes reduced
– Satellite frequency reduced
Nagios
• 4,131 services on 3,905 hosts
• Top 5 number of hosts on nagios.org
• Populated by SmokePing and memcache
– Nagios runs checks serially
– >1 hour vs. 15 mins
• Nagios populates
– sidebar alarms
– Schools Up Graph
Rancid
• Really Awesome New Cisco confIg Differ
• 3,296 Router configs
• Maintains history of changes
– Mails changes
Cacti
• 3,900 hosts
• Data gathering
– SNMP
– External Perl scripts
• Graph templating
• Database driven
• Cricket: 27 mins
– Perl
• Cacti: <5 mins
– Cactid
– Custom
multithreaded C
application
Cacti Weathermap
Interconnects
Netflow
• NfSen is a graphical
web based front end for
the nfdump netflow tools
• Query abuse reports
• Usage reporting
Reporting
• Daily Reports
• DNS log reporting
• Report infected PCs
– Top MX lookups
– Misconfigurations
– Active Directory
• Netflow
– IPs
– Schools usage
Gigabytes downloaded by schools on 22/03/07: 332
Gigabytes uploaded by schools on 22/03/07 : 48
Total MegaBytes downloaded for Digiweb Satellite: 12834
Total MegaBytes uploaded for Digiweb Satellite: 1202
Total MegaBytes downloaded for Digiweb Wireless: 77578
Total MegaBytes uploaded for Digiweb Wireless: 10217
Total MegaBytes downloaded for ESATBT ADSL: 54352
Total MegaBytes uploaded for ESATBT ADSL: 6632
Total MegaBytes downloaded for HSData Wireless: 3047
Total MegaBytes uploaded for HSData Wireless: 575
…..
Logging
• Syslog server per PoP
– Servers
– Routers
• Logcheck
– Logfile scanner
• IP to school identifier
– Mapping IP to school
Server Monitoring
• SSH keys
– Sharing
keys/fingerprints
– High overhead
• SNMP
– Less configurable
• Memcache
– Local Perl script
– Easy to rollout
– Load
– Disk Space
– Monitor Processes
Memcache
• Distributed memory caching system
• Low overhead
• Speed up dynamic database-driven websites by caching
data and objects in memory
• Developed for LiveJournal
– Slashdot
– Wikipedia
– SourceForge
• Schools
– Nagios
– Maps
– Server status
Subversion
• Modern replacement for CVS
• Provisioning System
– Configs
• ViewCVS
• Checkins get mailed
• Schools-noc
– Scripts stored on every server
– Automatically updated
– cron.d
Sidebar
•
•
•
•
Nagios polled every minute
Populated into memcache
Sidebar alarms
Pubcookie single sign-on
Provisioning System
• Services provisioned
–
–
–
–
–
–
–
–
–
–
CPE router config
Nagios
RADIUS
Cacti
Cisco ACS (TACACS+)
SmokePing
Fortigate (Content filtering)
Maps
DNS
Webhosting
Provisioning System
•
•
•
•
•
•
Text::Template templating system
Data stored in authoritative database
PostgreSQL’s INET type is brilliant!
Perl scripts generate configlets
Added to Subversion
Perl/Shell provisioning agents handle service restarts
etc.
• Ability to stop all provisioning
Provisioning System Structure
Postgresql
Perl
subversion
Shell
scripts
Database
Configuration
generator
Revision
control
Provisioning
agents
Google Maps
Random things we’ve encountered
• Predictable traffic levels
• Smokeping, Nagios and Cricket/Cacti take a
lot of tuning to monitor our network
• Difficult to achieve high bandwidth and high
level of reliability in transparent content filter
[email protected]