Introduction - Northern Kentucky University

Download Report

Transcript Introduction - Northern Kentucky University

CIT 470: Advanced Network and System Administration Workstations and Servers

CIT 470: Advanced Network and System Administration Slide #1

Topics

1. Machine Lifecycle 2. Automated Installs 3. Server Hardware 4. Services CIT 470: Advanced Network and System Administration Slide #2

Machine Lifecycle

Workstation Management

CIT 470: Advanced Network and System Administration Slide #4

States of Machines

New

A new machine

Clean

OS installed, but not yet configured for environment.

Configured

Configured correctly for the operating environment.

Unknown

Misconfigured, broken, newly discovered, etc.

Off

Retired/surplussed CIT 470: Advanced Network and System Administration Slide #5

State Transitions

Build

Set up hardware and install OS.

Initialize

Configure for environment; often part of build.

Update

Install new software.

Patch old software.

Change configurations.

CIT 470: Advanced Network and System Administration Slide #6

Automated Installs

Why Automate Installs?

1. Save time.

Boot the computer, then go do something else.

2. Ensure consistency.

No chance of entering wrong input during install.

Avoid user requests due to mistakes in config.

What works on one desktop, works on all.

3. Fast system recovery.

Rebuild system with auto-install vs. slow tapes.

CIT 470: Advanced Network and System Administration Slide #8

Trusting the Vendor Installation

Always reload the OS on new machines.

– You need to configure the host for your env.

– Eventually you’ll reload the OS on a desktop, leaving you with two platforms to support: the vendor OS install and your OS install.

– Vendors change their OS images from time to time, so systems you bought today have a different OS from systems bought 6 months ago.

CIT 470: Advanced Network and System Administration Slide #9

Install Types

1. Hard Disk Imaging Duplicate hard disk of installed system.

Advantages: fast, simple.

Disadvantages: need identical hardware, leads to many images, all of which must be updated manually when you make a change 2. Scripted Installs Installer accepts input from script.

Advantages: flexible, systems can be different Disadvantages: more effort to setup initially CIT 470: Advanced Network and System Administration Slide #10

Auto-Install Features

1. Unattended Requires little or no human interaction.

2. Concurrent Multiple installs can be performed at once.

3. Scalable New clients added easily.

4. Flexible Configurable to do custom install types.

CIT 470: Advanced Network and System Administration Slide #11

Auto-Install Components

Boot Component Media (floppy or CD) Network (PXE) Network Configuration DHCP: IP addresses, netmasks, DNS Install Configuration Media (floppy or CD) Network (tftp, ftp, http, NFS) Install Data and Programs Network (tftp, ftp, http, NFS) CIT 470: Advanced Network and System Administration Slide #12

PXE

Preboot eXecution Environment Intel standard for booting over the network.

PXE BIOS loads kernel over network.

Applications Diskless clients (use NFS for root disk.) Booting install program.

How it works 1. Asks DHCP server for config (ip, net, tftp.) 2. Downloads pxelinux from tftp server.

3. Boots pxelinux kernel.

4. Kernel uses tftp’d filesystem image or NFS filesystem.

CIT 470: Advanced Network and System Administration Slide #13

Disk Imaging

1. Setup ftp server.

2. Install OS image on a test client.

3. Verify test client OS.

4. Copy image to server.

5. Boot clients with imaging media.

6. Clients pull image from ftp server .

4. Copy image 6. Pull img 1. ftp server 2-3. test client 5. deployment #1 5. deployment #2 Slide #14 CIT 470: Advanced Network and System Administration

Clonezilla

CIT 470: Advanced Network and System Administration Slide #15

g4u

CIT 470: Advanced Network and System Administration Slide #16

Scripted Install Tools

Red Hat distributions, incl. Centos – Kickstart – Cobbler Debian distributions, incl. Ubuntu – FAI – Preseed Mandriva Linux – DrakX Solaris – Jumpstart CIT 470: Advanced Network and System Administration Slide #17

Network Configuration

What’s so bad about manual net settings?

– It’s only an IP address and netmask.

– What happens if you need to renumber?

Use DHCP instead of manual settings – Make all changes on a single server.

– Easy to change settings for entire network.

– DHCP can assign static IPs as well as dynamic.

CIT 470: Advanced Network and System Administration Slide #18

Servers vs. Desktops

How are Servers different?

• 1000s of clients depend on server.

• Requires high reliability.

• Requires tighter security.

• Often expected to last longer.

• Investment amortized over many clients, longer lifetime.

CIT 470: Advanced Network and System Administration Slide #20

Vendor Product Lines

Home – Cheapest purchase price.

– Components change regularly based on cost.

Business – Focuses on Total Cost of Ownership (TCO).

– Slower hardware changes, longer lifetime.

Server

– Lowest cost per performance metric (nfs, web) – Easy to service rack-mountable chassis.

– Higher quality (MIL-SPEC) components.

CIT 470: Advanced Network and System Administration Slide #21

Server Hardware

• More internal space.

• More CPU/Memory.

– More / high-end CPUs.

– More / faster memory.

• High performance I/O.

– PCIe vs PCI – SCSI/FC-AL vs. IDE • Rack mounted.

• Redundancy – RAID – Hot-swap, hot-spares CIT 470: Advanced Network and System Administration Slide #22

Rack Mounting

Efficient space utilization.

   Simple, rectangular shape measured in RUs.

Repair and upgrade while mounted in rack.

No side access required.

Requirements   Cooling through back, not sides.

Drives in front, cables in back.

 Remote management (serial console, hardware sensors, VM MUI) CIT 470: Advanced Network and System Administration Slide #23

Server Memory

Servers need more RAM than desktops.

– x86 supports up to 64GB with PAE.

– x86-64 supports 1 PB (1024 TB) Servers need faster RAM than desktops.

– Higher memory speeds.

– Multiple DIMMs accessed in parallel.

– Larger CPU caches.

CIT 470: Advanced Network and System Administration Slide #24

Server CPUs

Intel Xeon • Up to 8 cores with 2 threads each @ 1.8 to 3.3 GHz • Up to 18 MB L3 cache AMD Opteron • 4, 6, 8, or 12 cores @ 1.4 to 3.2 GHz • Up to 12 MB L3 cache IBM Power 7 • 4, 6, or 8 cores with 4 threads each @ 3.0 to 4.25 GHz • 4 MB L3 cache per core (up to 32MB for 8-core) Sun Niagara 3 • 16 cores with 8 threads each @ 1.67 GHz • 6 MB L2 cache CIT 470: Advanced Network and System Administration Slide #25

Xeon vs Pentium/Core CPUs

Xeon based on Pentium/Core with changes that vary by model: – Allows more CPUs – Has more cores – Better hyperthreading – Faster/larger CPU caches – Faster/larger RAM support CIT 470: Advanced Network and System Administration Slide #26

System Buses

Servers need high I/O throughput.

– Fast peripherals: SCSI-3, Gigabit ethernet – Often use multiple and/or faster buses.

PCI – Desktop: 32-bit 33 MHz, 133 MB/s – Server: 64-bit 66 MHz, 533 MB/s PCI-X (backward compatible) – v1.0: 64-bit 133 MHz, 1.06 GB/s – v2.0: 64-bit 533 MHz, 4.3 GB/s PCI Express (PCIe) – Serial architecture, v3.0 up to 16 GB/s CIT 470: Advanced Network and System Administration Slide #27

Hardware Redundancy

Disks are most likely component to fail.

– Use RAID for disk redundancy.

– Cover in detail in Disks lecture.

Power supplies second most likely to fail.

– Use redundant power supplies.

– Many servers need 2 power supplies normally.

– Need 3 power supplies for redundancy.

– Use separate power cord and UPS for each power supply.

CIT 470: Advanced Network and System Administration Slide #28

Full and n+1 Redundancy

n+1 Redundancy

: One component can fail, but the system is still functional.

– Ex: RAID 5, dual NICs with failover

Full Redundancy

: Two complete sets of hardware configured with failover mechanism.

– Manual: SA switches to 2 nd system when notices failure.

– Automatic: The second system monitors the first and switches over automatically on failure.

– Load-sharing: Both systems serve users, sharing load, but each has capacity to handle entire load on its own. When one fails, other automatically handles entire load.

CIT 470: Advanced Network and System Administration Slide #29

Hot-swap Components

Hot-swap components – Components can be replaced while running.

– Need n+1 redundancy for this to be useful.

– Don’t need to schedule a downtime.

Issues – Which parts are hot-swappable?

– May require a few seconds to reconfigure.

– Be sure components are hot-swap, not hot-plug.

CIT 470: Advanced Network and System Administration Slide #30

Hot Plug and Hot Spare

Hot Plug – Electrically safe to replace component.

– Part may not be recognized until next reboot.

– Requires downtime, unlike hot swap.

Hot Spare – Spare component already plugged into system.

– System automatically uses hot spare when disk/CPU board etc. fails.

– Provides n+2 redundancy.

CIT 470: Advanced Network and System Administration Slide #31

Separate Administrative Network

Reliability – Allows access to machines even when network is down.

Performance – Backups require so much bandwidth that they’re often done over their own network.

Security – Network security monitoring data and logs sent across network should be secured.

CIT 470: Advanced Network and System Administration Slide #32

Maintenance Contracts

• • • • • • All machines eventually break.

• Vendors offer variety of maint contracts.

Non-critical Clusters

: Next-day or 2-day contract.

: If you have many similar hosts (CPU or web farm), then on-site spares may be cheaper than maintenance contract.

Controlled Model

: Use small # of machine types for all servers, so you can afford a spares kit.

Critical Host

: Same-day response or on-site spares.

Highly Critical

: On-site technician + dup machine.

CIT 470: Advanced Network and System Administration Slide #33

Data Protection

• Avoid desktop backups by storing data on servers. Easy on UNIX, harder on Windows.

• Use RAID for server hardware failures.

– Mirror root disk, higher RAID levels for data.

– Some servers use 16GB Flash drives for root disk.

– Doesn’t protect against software mistakes.

• Server backups – Use specialized admin network to keep load off main network.

– Use specialized tape jukeboxes to fully automate backups of large data servers (DBs, fileservers).

CIT 470: Advanced Network and System Administration Slide #34

Keep Servers in Data Center

Data center necessary for server reliability.

– Power (enough power, UPS) – Climate control (temperature, humidity) – Fire protection – High-speed network – Physical security CIT 470: Advanced Network and System Administration Slide #35

Server Operating Systems

CIT 470: Advanced Network and System Administration Slide #36

Server OS Image

Need greater reliability, security than desktop.

– Remove unnecessary OS components.

– Configure for best security & performance.

Install and config specialized server software.

– Server software: web, db, nfs, dns, ldap, etc.

– May need monitoring software too.

– Configuration: disk space, networking Server OS install should be automated too.

CIT 470: Advanced Network and System Administration Slide #37

Remote Administration

Servers must be accessible remotely.

– Allows SA to fix problems quickly at 3am.

– Allows SA to work outside machine room.

Remote Administration – Serial console and concentrator (UNIX) – Networked KVM (Windows) – Remote power control.

– Important to secure remote admin facilities.

CIT 470: Advanced Network and System Administration Slide #38

Server Appliances

Dedicated hardware + software – Fileserver (NetApp, Auspex) – Print servers – Routers Advantages – Performance – Reliability – Easy to setup – Extra capabilities Disadvantages – Cost CIT 470: Advanced Network and System Administration Slide #39

Many Inexpensive Workstations

Why buy server hardware?

– Buy two cheap rack mount PCs + failover software.

– Works if two PCs cheaper than server.

– Google’s approach with ~450,000 servers.

CIT 470: Advanced Network and System Administration Slide #40

Blade Servers

• High-density servers on a board.

– CPU – Memory – Disk • Each blade lives in a blade chassis.

CIT 470: Advanced Network and System Administration Slide #41

Blade Chassis

• Blade chassis provides power, network, remote.

• Typically hot swappable, hot-spare.

• Racks can only support 1 svr/RU.

• Blades are higher density, but also require more power and cooling.

CIT 470: Advanced Network and System Administration Slide #42

Services

Servers vs Services

A

server

is a piece of hardware.

A

service

is the function that is provided by one or more servers.

CIT 470: Advanced Network and System Administration Slide #44

Services

• Distinguish structured computing environment from some standalone PCs.

• Large orgs linked through shared services to ease communication and optimize resources.

• Typical environments have many services – Fundamental: net, DNS, email, auth, printing.

– Typical: DHCP, backup, directory, file, license.

• Services often depend on other services – Almost everything depends on DNS.

CIT 470: Advanced Network and System Administration Slide #45

Providing a Service

A service is more than hardware + software.

A service must be 1. Reliable.

2. Scalable.

3. Monitored.

4. Maintained.

5. Supported.

CIT 470: Advanced Network and System Administration Slide #46

Servers and Services

For a service to be reliable, servers should: – Be as simple as possible.

– Have minimum software to run service.

– Depend on as few other services as possible.

– Depend only on services that are at least as reliable as the service running on the server.

– Have access restricted to SAs.

– Be as few as needed for performance + reliability.

CIT 470: Advanced Network and System Administration Slide #47

Customer Requirements

Customers are the reason for the service.

– How do they intend to use it?

– What features do they need?

– What features would they like to have?

– How critical is the service?

– What levels of availability and support are needed?

Service Level Agreement (SLA) – Enumerates services.

– Defines level of support.

– Commits to response times for problem types.

CIT 470: Advanced Network and System Administration Slide #48

Operational Requirements

Essential to designing a reliable service – What services does it depend upon?

– What other services will depend upon it?

– How does it interoperate with other services?

– How can it be integrated with auth/dir services?

– How does the service scale?

– How can the service be upgraded?

• Downtime requirements.

• What systems are affected?

CIT 470: Advanced Network and System Administration Slide #49

Open Architecture

Service should be built around open standards – Check IETF RFCs to see if it’s an open protocol.

– Example service: SMTP – Example products: exim, postfix, qmail, sendmail.

– Open standards don’t require open source.

Allows vendors to make interoperable products.

– Avoids vendor lock-in.

– Allows vendor competition (cheaper prices for you.) – Decouples client selection from server selection.

– Avoids need for protocol gateways.

CIT 470: Advanced Network and System Administration Slide #50

Requests for Comments (RFCs)

Documentation for Internet protocols, technologies, and methodologies.

– Standards track RFCs describe Internet standards (TCP, IP, SMTP) and must be approved by IETF.

– Experimental RFCs may become standards.

– Best Common Practice RFCs describe how to run services or use protocols.

– Informational RFCs is a catch-all including proprietary protocols, April Fool’s jokes, etc.

Available from http://www.rfc-editor.org/ CIT 470: Advanced Network and System Administration Slide #51

Principles for Designing a Reliable Service

Simplicity – The more features, the more bugs.

– Simplicity increases reliability, ease of maintenance.

Vendor Relations – Can be helpful about configuring service.

– Let vendors compete for your business.

– Stick to vendors who develop for your platform.

CIT 470: Advanced Network and System Administration Slide #52

Machine Independence

Will eventually move service to new host.

– Want to avoid having a downtime.

– Want to avoid reconfiguring every desktop.

Use generic DNS alias for machine – Mail server has name

romero

– DNS alias is

smtp

Use virtual IP addresses for non-name svcs – Machine has usual IP address: 192.168.1.54

– Virtual: ifconfig eth0:0 192.168.1.

5 CIT 470: Advanced Network and System Administration Slide #53

Dedicated Machines

Put each service on its own machine(s).

– If a server crashes, only impacts one service.

– Easier to debug if only one service running.

– Performance tuning easier with one service.

– If you can’t afford a new machine, use a VM.

CIT 470: Advanced Network and System Administration Slide #54

Environment

Safe environment – Improves reliability: AC, UPS, physical security.

– Data center usually provides faster network too.

– Only rely on services provided by data center.

Restricted access – Customers should not need to login to servers.

– More logins decrease stability, performance.

– Even Windows can be stable w/o user logins.

CIT 470: Advanced Network and System Administration Slide #55

Principles for Designing a Reliable Service Service components should be tightly coupled.

– Other than redundant components.

– Share same power source, network.

– Reduces service dependencies (single points of failure.) Centralize management of service – Managed by one set of SAs.

– Support for service by single helpdesk.

– Document service.

CIT 470: Advanced Network and System Administration Slide #56

Performance

Latency vs throughput – –

Latency

is delay before data received.

Throughput

is how much data sent per second.

– Performance problems typically affects one.

– Increasing the other will not solve your problem.

Remote sites – May have high latency to main site.

– Do you need secondary servers at remote sites?

CIT 470: Advanced Network and System Administration Slide #57

Capacity Planning

Estimate capacity from testing.

– Test server at 100 qps, 200 qps, until slow.

– Identify resources used by each query • RAM • Disk • Network • CPU Can service be split onto multiple servers?

– Can it be done w/o users noticing?

CIT 470: Advanced Network and System Administration Slide #58

Principles for Designing a Reliable Service

Monitoring – Availability, problems, performance.

– Auto-alert front line support.

– Customers shouldn’t discover problems before SA.

– Capacity planning: CPU, mem, disk, network, licenses.

Service Rollout – First impressions are difficult to change.

– Be ready for support: docs, trained helpdesk.

– Use one, some, many technique.

CIT 470: Advanced Network and System Administration Slide #59

Key Points

Desktop Lifecycle: New, clean, configured, unknown states.

Automated Installs – Why: consistency, fast recovery, saves time.

– Install types: imaging vs. scripted.

– Components: boot, network, config, data.

– Think about how Principles of SA apply.

Servers vs desktops – Requirements and hardware differences.

Redundancy – Full vs n+k redundancy.

– Hot plug vs hot spare.

Services – Requirements: service, server, customer, operational.

– Machine independence and open architectures.

Performance: Latency vs. throughput.

CIT 470: Advanced Network and System Administration Slide #60

1.

2.

3.

4.

5.

References

Mark Burgess,

Principles of System and Network Administration

, Wiley, 2000.

Aeleen Frisch 2002.

, Essential System Administration, 3 rd edition

, O’Reilly, R. Evard. "An analysis of unix system configuration."

Proceedings of the 11th Systems Administration conference (LISA)

pers/20.evard/20_html/main.html

, 1997 , page 179, http://www.usenix.org/publications/library/proceedings/lisa97/full_pa Thomas Limoncelli, Christine Hogan, Strata Chalup,

The Practice of System and Network Administration, 2 nd ed,

Limoncelli and Hogan, Addison-Wesley, 2007.

Evi Nemeth et al,

UNIX System Administration Handbook, 3 rd edition

, Prentice Hall, 2001.

CIT 470: Advanced Network and System Administration Slide #61