Title Page - UnXis, Inc. | Highly reliable platforms for

Download Report

Transcript Title Page - UnXis, Inc. | Highly reliable platforms for

THE SCO GROUP 2007
SCO Unix Diagnostics and Troubleshooting
Alexander Sack ([email protected])
Senior Software Engineer
1
© The SCO Group, Inc. All Rights Reserved
Agenda







Intro
Initial System Load (ISL)
Common Hardware and Driver Issues
System Tuning
Networking Tips
Reporting Problems
Q&A
2
ISL: Overview
 Before installing…







Has the system itself been certified by the OEM?
Is the motherboard in the CHWP? (Intel whitebox)
Is it compatible kinda sorta maybe?
Do I need a third-party HBA diskette?
Network card supported?
Does X support my graphic chipset?
Disk layout issues, multi-boot?
3
ISL: Debugging
 “Alt-SysReq-H” or “Alt-Ctrl-H” to enter console
mode
 “Alt-SysReq-F1” or “Alt-Ctrl-F1” to go back to
install screens
 Acess to resmgr, ISL scripts (/isl/ui_modules),
note any console messages during install
 IVAR_DEBUG_ALL=1
 Dumps log files in /tmp/log
 Transfer logs to floppy via cpio
 E.g. find /tmp/log/* | cpio –oc –O /dev/dsk/f03ht
 cpio –ic –I /dev/dsk/f03ht
4
ISL: Issues
 Problem: Installation sees more processors than actually
present
 Reasons:
 Bad MPS tables
 Cores listed as physical CPUs in BIOS
 Limited ACPI support (OSR5 only)
 Solution:
 Boot in single processor mode (ATUP) and apply latest MP/SMP
pack
 ACPI=Y, USE_XAPIC=Y, ENABLE_JT=Y, MULTICORE=N
 Flash BIOS
5
ISL: Issues
 Problem: Kernel hangs on boot-up
 Reasons:
 Missing interrupts
 Mixed stepping processors
 Solution:
 Boot in single processor mode (ATUP)
 Reverse stepped processors, make the LOWER stepping
processor in slot 1
 Check BIOS settings, ACPI vs. MPS
 Move add-on PCI card to a different slot
 PnP set to OFF in BIOS
6
ISL: Issues
 Problem: Can not load a HBA from USB floppy
 Reasons:




BIOS does not support legacy mode (OSR5 only)
“Device enumeration timeout”
USB is disabled in the BIOS
ISL CD left in tray
 Solution:






Check USB BIOS settings
Re-plug USB floppy device, verify sdiconfig output on console
Follow TA article on renaming disk nodes
Remove CD before load
Make sure disk was created correctly, dd image to p0 not s0
Try a different USB floppy device
7
ISL: Issues
 Problem: Root HBA not found after the DCU runs
 Reasons:




Didn’t load the right third-party HBA
Software based RAID issues
Valid media kit
USB floppy wasn’t really picked up (ISL will use CD1 for HBA
drivers from an ATAPI drive)
 Solution:
 Disconnect USB floppy after HBA loads
 Bind third-party resmgr entry to HBA driver manually via DCU
 Check resmgr entry BOARDID and verify that HBA really
supports the card
 Download a later driver from IHV website
8
ISL: Issues
 Problem: SATA or IDE hangs after loading or fails to
recognize my devices
 Reasons:




Missed interrupts (polling messages)
DMA incompatibility
Driver in slave only configuration (OSR6/UW7)
SATA/PATA card uses custom third-party driver (e.g. Adaptec,
Silicon Image, Marvell)
 Solution:
 Check cables and jumpers Change mode in BIOS: Legacy,
Compatible, Enhanced, AHCI
 ATAPI_DMA_DISABLE=Y
 Avoid cable select (legacy PATA)
9
ISL: Issues
 Problem: Red screen during mount of CD
 Reasons:




Missed interrupts (polling messages)
DMA incompatibility
Driver in slave only configuration (OSR6/UW7)
SATA/PATA card uses custom third-party driver (e.g. Adaptec,
Silicon Image, Marvell)
 Solution:
 Check cables and jumpers Change mode in BIOS: Legacy,
Compatible, Enhanced, AHCI
 ATAPI_DMA_DISABLE=Y
 Avoid cable select (legacy PATA)
10
ISL: Issues
 Problem: NIC is not auto-detected
 Reasons:
 Driver on ISL media is older than card
 Driver issues with card, driver loads but fails
 Solution:
 Defer networking and pkgadd drivers after install
 After install, use SCOadmin Network to configure
card
 Bind entry to particular NIC driver if card is within the
same family via DCU
 Stick in another card!
11
ISL: Issues
 Problem: vfs_mountroot() failure
 Reasons:
 Driver on ISL media is older than card
 Driver issues with card, driver loads but fails
 “$static” not added to ROOT HBA sdevice file
 Solution:
 Follow TA to mount disk from ISL
 Use the RECUT media
 Make sure you are using the latest HBA driver
12
ISL: Issues
 Problem: Screen goes blank after logo appears
 Reasons:
 VESA mode is not supported by card
 On-board chipset uses system memory for
framebuffer
 Solution:
 AGP Gart is now supported, install latest maintenance
pack
 USE_VESA_BIOS=Y
 Use a supported graphics chipset!
13
ISL: Issues
 Problem: Filesystem is left dirty after ISL and every
reboot
 Reasons:




Aggressive BIOS Power Management
RAID battery failure
Target issues – CHECK CONDITIONS
Older driver and the write cache
 Solution:
 Check RAID battery levels
 Check HBA and target firmware revision
 Update to latest driver
14
ISL: Issues
 Problem: Installed one OS and another one won’t boot
 Reasons:




OSR5 8GB limit
UW7/OSR6 128GB limit
OSR5 on the first partition of a drive is recommended
MBR rewritten
 Solution:
 Use CD1 to boot-up and execute fdisk to rewrite MBR from
UW7/OSR6 fdisk
 Use a third-party boot loader like GRUB
15
ISL: Issues
 Problem: Failing to create large logical volumes
 Reasons:




VXFS technical 2TB limit
OSR6/UW7 1TB physical capacity limit
HTFS has issues with greater than 1TB filesystems (slow)
RAID utility issues
 Solution:
 Use VXFS and ODM
 Split volumes in 1TB chunks
 Use RAID BIOS or OEM utility if possible to always setup
volumes
16
ISL: Issues
 Problem: ISL load time is very slow
 Reasons:




ATAPI DMA is disabled
Write caching is disabled
Media errors
Faulty hardware
 Solution:
 Check IDE/SATA settings
 Some OEM disable write caching which makes install slow –
future boot parameter
 Check hardware and BIOS settings
17
ISL: Issues
 Problem: Kernel link failure at end of ISL
 Reasons:
 IRQ conflicts in System driver file
 Driver configuration build error
 Solution:




Check BIOS settings
Disable serial or legacy devices you don’t need
Chroot into fresh install and check build files
Update HBA drivers if available
18
ISL: Issues
 Problem: Kernel panics on boot-up
 Reasons:
 Full moon out
 You weren’t nice to the machine that day
 The customer is out to get you
 Solution:





Boot in single processor mode
Disable USB via boot parameter or BIOS
Take note if possible of the stack trace to discern error
Cry to the OEM
Cry to SCO support
19
Hardware and Driver Issues: Disk migration

Migrating OSR5 disk to OSR6




Limitations:



Install wd supplement before migration!
Administer the disk at the source system FIRST
before migration
OSR6 Divvy now works on OSR5 (wd) and OSR6
disks
There is no conversion for UW VTOC disks to dual
format OSR6
OSR6 does not support extended VTOC slices
Always back your data before migration!
20
Hardware and Driver Issues: Multi-core
 All Intel based processors are multi-core!
 ACPI is required to fully support multi-core
(OSR6/UW7)
 OSR5 supports multi-core provided MPS tables are
sane – has some ACPI support (HT)
 OEMs have stopped testing MPS table!
 SCO licenses per CPU package not core
(industry standard)
 Mixed steppings headaches
21
Hardware and Driver Issues: HBAs
 What driver to use?
 If in doubt, always use the driver diskette with the higher
IHVVERSION in it!
 Supported cards can be found in the Drvmap files of the
HBA driver/btld package
 http://pciids.sourceforge.net/
 Sometimes adding a OEM branded BOARDID will work –
sometimes it will panic your system!
 “echo pcilong | ndcfg”
 Management utilities are packaged with the driver if
available
 Recut media and maintenance packs include latest
drivers
 Read the README posted on the SCO download area!
22
System Tuning: General
 Migrating from OSR5 to OSR6
 DO NOT BLINDLY import OSR5 tunables from OSR6
 E.g. buffer cache has different use on OSR6
 Identify the performance problem you are trying to
solve first! [ GOLDEN RULE ]
 Take measurements
 /etc/conf/bin/idtune
 SCOadmin has wrapper for idtune
23
System Tuning: Performance
 Performance Tuning
 Identify bottleneck
 Rtpm, prfstat, sar, prof, lprof
 CPU performance
 sar –u
 00:00:00
 00:00:01
%usr %sys %wio %idle %intr
30
10
10
46
4
 high usr, investigate with truss, prof
 high sys, intr, investigate with prfstat
 high wio, storage throughput
24
System Tuning: Simple Example
25
System Tuning: Simple Example
26
System Tuning: Simple Example
27
System Tuning: Storage
 Storage Performance
 Hardware configuration
 Device topology
 don’t connect slow devices and fast devices on the same bus
e.g. put your slow tape drive on a separate controller
 Cabling
 ensure your cables are up to specifications
 Hardware RAID
 performance RAID 0 vs integrity RAID 1 RAID 5
 Filesystem tuning
 fsadm, block size, increase logsize (@ mkfs only)
 mount options; tmplog
 ODM dramatic performance boost for $99
28
System Tuning: Memory
 Memory
 Avoid swapping
 DEDICATED_MEMORY, use if using shared memory




mkdev dedicated
Dedicated memory reserves physical
Saves kernel virtual
Reduces paging, uses large mappings (PSE)
 SEGKMEM_PSE_BYTES
 Add more memory!
29
System Tuning: Filesystem
 Tuning for largefile support
 HDATLIM, SDATLIM, HVMMLIM, SVMMLIM,
HFSZLIM, SFSZLIM set to 0x7fffffff (unlimited)
 /etc/conf/bin/idbuild –B && init 6
 fsadm /mountpoint or raw device
 fsadm –o largefiles /
 OSR6 defaults to largefiles, UW7 does not
 Building large file aware applications
 -D_FILE_OFFSET_BITS=64
30
Networking Tips: Configuration
 Network configuration
 netconfig
 drivers installed in /etc/inst/nd/
 bcfg files are parsed by ndcfg
 /etc/confnet.d/inet/interface is configured
 at boot /etc/tcp (c.f. S69inet on UW) is run to link the driver into
dlpi - initialize -U
 STREAMS based network stack
 ndcfg
 useful for displaying info about the system
 geared toward network device driver writers
31
Networking Tips: Tuning and Tools

Network monitoring & tuning tools









netstat
ifconfig
inconfig
ndstat
ndcfg
traceroute
ping
Tcpdump
dlpid logging
 dlpid –l <logfile> /etc/inst/nd/dlpidPIPE
 or edit /etc/default/dlpid
 LOG=<logfile>

NIC failover
 automatically and transparently switch to a backup NIC in the event of failure of
the primary
 Chains of backup NICs supported
32
Networking Tips: Commons Issues
 Network is UP but can’t connect to other
systems
 is DNS configured correctly?
 netstat –rna
 do you have a default route?
 Network performance is poor
 check cabling
 ndstat –l
 collisions
 inconfig
 nfsstat
33
Networking Tips: Common Issues
 Network responds to pings but can’t login
 are the daemons running ?
 licensed ?
 Multiple hosts with the same IP or MAC
 arp –an (-n disable name resolution)
 ? (132.147.103.1) at xx:xx:xx:xx:xx:xx (802.3)
 ? (132.147.103.9) at xx:xx:xx:xx:xx:xx (802.3)
 Stopping and starting the interface




ifconfig net0 down
/etc/tcp stop – daemons stopped, NIC is UP
/etc/tcp shutdown – everything down
/etc/nd stop start
34
Reporting Problems
 crash
 Primarily used for panic analysis




/var/spool/dump
dumpmemory to generate a crash dump on a live system
crash –a <dumpfile>; will produce a listing suitable for SCO support
provide dumpfile, /stand/unix, all of /etc/conf/mod.d, /usr/sbin/crash
 Useful crash commands
 ps, as, trace, u, eng, od, addstruct, help
 walk data structures using od
 od –f
 ksh style history buffer
 lsof, can save hours of fun on a live system
35
Reporting Problems
 When reporting problems to support:




Establish a reproducible case (if possible)
Save any crash related files
Note stack trace, crash -a
Save system log files
 /var/adm/
 Include hardware specs when filing a bug
 run sysinfo
 Be aware of changes made to /stand/boot
 bootparam
36
Q&A
37