Title Page - UnXis, Inc. | Highly reliable platforms for
Download
Report
Transcript Title Page - UnXis, Inc. | Highly reliable platforms for
THE SCO GROUP 2007
SCO Unix Diagnostics and Troubleshooting
Alexander Sack ([email protected])
Senior Software Engineer
1
© The SCO Group, Inc. All Rights Reserved
Agenda
Intro
Initial System Load (ISL)
Common Hardware and Driver Issues
System Tuning
Networking Tips
Reporting Problems
Q&A
2
ISL: Overview
Before installing…
Has the system itself been certified by the OEM?
Is the motherboard in the CHWP? (Intel whitebox)
Is it compatible kinda sorta maybe?
Do I need a third-party HBA diskette?
Network card supported?
Does X support my graphic chipset?
Disk layout issues, multi-boot?
3
ISL: Debugging
“Alt-SysReq-H” or “Alt-Ctrl-H” to enter console
mode
“Alt-SysReq-F1” or “Alt-Ctrl-F1” to go back to
install screens
Acess to resmgr, ISL scripts (/isl/ui_modules),
note any console messages during install
IVAR_DEBUG_ALL=1
Dumps log files in /tmp/log
Transfer logs to floppy via cpio
E.g. find /tmp/log/* | cpio –oc –O /dev/dsk/f03ht
cpio –ic –I /dev/dsk/f03ht
4
ISL: Issues
Problem: Installation sees more processors than actually
present
Reasons:
Bad MPS tables
Cores listed as physical CPUs in BIOS
Limited ACPI support (OSR5 only)
Solution:
Boot in single processor mode (ATUP) and apply latest MP/SMP
pack
ACPI=Y, USE_XAPIC=Y, ENABLE_JT=Y, MULTICORE=N
Flash BIOS
5
ISL: Issues
Problem: Kernel hangs on boot-up
Reasons:
Missing interrupts
Mixed stepping processors
Solution:
Boot in single processor mode (ATUP)
Reverse stepped processors, make the LOWER stepping
processor in slot 1
Check BIOS settings, ACPI vs. MPS
Move add-on PCI card to a different slot
PnP set to OFF in BIOS
6
ISL: Issues
Problem: Can not load a HBA from USB floppy
Reasons:
BIOS does not support legacy mode (OSR5 only)
“Device enumeration timeout”
USB is disabled in the BIOS
ISL CD left in tray
Solution:
Check USB BIOS settings
Re-plug USB floppy device, verify sdiconfig output on console
Follow TA article on renaming disk nodes
Remove CD before load
Make sure disk was created correctly, dd image to p0 not s0
Try a different USB floppy device
7
ISL: Issues
Problem: Root HBA not found after the DCU runs
Reasons:
Didn’t load the right third-party HBA
Software based RAID issues
Valid media kit
USB floppy wasn’t really picked up (ISL will use CD1 for HBA
drivers from an ATAPI drive)
Solution:
Disconnect USB floppy after HBA loads
Bind third-party resmgr entry to HBA driver manually via DCU
Check resmgr entry BOARDID and verify that HBA really
supports the card
Download a later driver from IHV website
8
ISL: Issues
Problem: SATA or IDE hangs after loading or fails to
recognize my devices
Reasons:
Missed interrupts (polling messages)
DMA incompatibility
Driver in slave only configuration (OSR6/UW7)
SATA/PATA card uses custom third-party driver (e.g. Adaptec,
Silicon Image, Marvell)
Solution:
Check cables and jumpers Change mode in BIOS: Legacy,
Compatible, Enhanced, AHCI
ATAPI_DMA_DISABLE=Y
Avoid cable select (legacy PATA)
9
ISL: Issues
Problem: Red screen during mount of CD
Reasons:
Missed interrupts (polling messages)
DMA incompatibility
Driver in slave only configuration (OSR6/UW7)
SATA/PATA card uses custom third-party driver (e.g. Adaptec,
Silicon Image, Marvell)
Solution:
Check cables and jumpers Change mode in BIOS: Legacy,
Compatible, Enhanced, AHCI
ATAPI_DMA_DISABLE=Y
Avoid cable select (legacy PATA)
10
ISL: Issues
Problem: NIC is not auto-detected
Reasons:
Driver on ISL media is older than card
Driver issues with card, driver loads but fails
Solution:
Defer networking and pkgadd drivers after install
After install, use SCOadmin Network to configure
card
Bind entry to particular NIC driver if card is within the
same family via DCU
Stick in another card!
11
ISL: Issues
Problem: vfs_mountroot() failure
Reasons:
Driver on ISL media is older than card
Driver issues with card, driver loads but fails
“$static” not added to ROOT HBA sdevice file
Solution:
Follow TA to mount disk from ISL
Use the RECUT media
Make sure you are using the latest HBA driver
12
ISL: Issues
Problem: Screen goes blank after logo appears
Reasons:
VESA mode is not supported by card
On-board chipset uses system memory for
framebuffer
Solution:
AGP Gart is now supported, install latest maintenance
pack
USE_VESA_BIOS=Y
Use a supported graphics chipset!
13
ISL: Issues
Problem: Filesystem is left dirty after ISL and every
reboot
Reasons:
Aggressive BIOS Power Management
RAID battery failure
Target issues – CHECK CONDITIONS
Older driver and the write cache
Solution:
Check RAID battery levels
Check HBA and target firmware revision
Update to latest driver
14
ISL: Issues
Problem: Installed one OS and another one won’t boot
Reasons:
OSR5 8GB limit
UW7/OSR6 128GB limit
OSR5 on the first partition of a drive is recommended
MBR rewritten
Solution:
Use CD1 to boot-up and execute fdisk to rewrite MBR from
UW7/OSR6 fdisk
Use a third-party boot loader like GRUB
15
ISL: Issues
Problem: Failing to create large logical volumes
Reasons:
VXFS technical 2TB limit
OSR6/UW7 1TB physical capacity limit
HTFS has issues with greater than 1TB filesystems (slow)
RAID utility issues
Solution:
Use VXFS and ODM
Split volumes in 1TB chunks
Use RAID BIOS or OEM utility if possible to always setup
volumes
16
ISL: Issues
Problem: ISL load time is very slow
Reasons:
ATAPI DMA is disabled
Write caching is disabled
Media errors
Faulty hardware
Solution:
Check IDE/SATA settings
Some OEM disable write caching which makes install slow –
future boot parameter
Check hardware and BIOS settings
17
ISL: Issues
Problem: Kernel link failure at end of ISL
Reasons:
IRQ conflicts in System driver file
Driver configuration build error
Solution:
Check BIOS settings
Disable serial or legacy devices you don’t need
Chroot into fresh install and check build files
Update HBA drivers if available
18
ISL: Issues
Problem: Kernel panics on boot-up
Reasons:
Full moon out
You weren’t nice to the machine that day
The customer is out to get you
Solution:
Boot in single processor mode
Disable USB via boot parameter or BIOS
Take note if possible of the stack trace to discern error
Cry to the OEM
Cry to SCO support
19
Hardware and Driver Issues: Disk migration
Migrating OSR5 disk to OSR6
Limitations:
Install wd supplement before migration!
Administer the disk at the source system FIRST
before migration
OSR6 Divvy now works on OSR5 (wd) and OSR6
disks
There is no conversion for UW VTOC disks to dual
format OSR6
OSR6 does not support extended VTOC slices
Always back your data before migration!
20
Hardware and Driver Issues: Multi-core
All Intel based processors are multi-core!
ACPI is required to fully support multi-core
(OSR6/UW7)
OSR5 supports multi-core provided MPS tables are
sane – has some ACPI support (HT)
OEMs have stopped testing MPS table!
SCO licenses per CPU package not core
(industry standard)
Mixed steppings headaches
21
Hardware and Driver Issues: HBAs
What driver to use?
If in doubt, always use the driver diskette with the higher
IHVVERSION in it!
Supported cards can be found in the Drvmap files of the
HBA driver/btld package
http://pciids.sourceforge.net/
Sometimes adding a OEM branded BOARDID will work –
sometimes it will panic your system!
“echo pcilong | ndcfg”
Management utilities are packaged with the driver if
available
Recut media and maintenance packs include latest
drivers
Read the README posted on the SCO download area!
22
System Tuning: General
Migrating from OSR5 to OSR6
DO NOT BLINDLY import OSR5 tunables from OSR6
E.g. buffer cache has different use on OSR6
Identify the performance problem you are trying to
solve first! [ GOLDEN RULE ]
Take measurements
/etc/conf/bin/idtune
SCOadmin has wrapper for idtune
23
System Tuning: Performance
Performance Tuning
Identify bottleneck
Rtpm, prfstat, sar, prof, lprof
CPU performance
sar –u
00:00:00
00:00:01
%usr %sys %wio %idle %intr
30
10
10
46
4
high usr, investigate with truss, prof
high sys, intr, investigate with prfstat
high wio, storage throughput
24
System Tuning: Simple Example
25
System Tuning: Simple Example
26
System Tuning: Simple Example
27
System Tuning: Storage
Storage Performance
Hardware configuration
Device topology
don’t connect slow devices and fast devices on the same bus
e.g. put your slow tape drive on a separate controller
Cabling
ensure your cables are up to specifications
Hardware RAID
performance RAID 0 vs integrity RAID 1 RAID 5
Filesystem tuning
fsadm, block size, increase logsize (@ mkfs only)
mount options; tmplog
ODM dramatic performance boost for $99
28
System Tuning: Memory
Memory
Avoid swapping
DEDICATED_MEMORY, use if using shared memory
mkdev dedicated
Dedicated memory reserves physical
Saves kernel virtual
Reduces paging, uses large mappings (PSE)
SEGKMEM_PSE_BYTES
Add more memory!
29
System Tuning: Filesystem
Tuning for largefile support
HDATLIM, SDATLIM, HVMMLIM, SVMMLIM,
HFSZLIM, SFSZLIM set to 0x7fffffff (unlimited)
/etc/conf/bin/idbuild –B && init 6
fsadm /mountpoint or raw device
fsadm –o largefiles /
OSR6 defaults to largefiles, UW7 does not
Building large file aware applications
-D_FILE_OFFSET_BITS=64
30
Networking Tips: Configuration
Network configuration
netconfig
drivers installed in /etc/inst/nd/
bcfg files are parsed by ndcfg
/etc/confnet.d/inet/interface is configured
at boot /etc/tcp (c.f. S69inet on UW) is run to link the driver into
dlpi - initialize -U
STREAMS based network stack
ndcfg
useful for displaying info about the system
geared toward network device driver writers
31
Networking Tips: Tuning and Tools
Network monitoring & tuning tools
netstat
ifconfig
inconfig
ndstat
ndcfg
traceroute
ping
Tcpdump
dlpid logging
dlpid –l <logfile> /etc/inst/nd/dlpidPIPE
or edit /etc/default/dlpid
LOG=<logfile>
NIC failover
automatically and transparently switch to a backup NIC in the event of failure of
the primary
Chains of backup NICs supported
32
Networking Tips: Commons Issues
Network is UP but can’t connect to other
systems
is DNS configured correctly?
netstat –rna
do you have a default route?
Network performance is poor
check cabling
ndstat –l
collisions
inconfig
nfsstat
33
Networking Tips: Common Issues
Network responds to pings but can’t login
are the daemons running ?
licensed ?
Multiple hosts with the same IP or MAC
arp –an (-n disable name resolution)
? (132.147.103.1) at xx:xx:xx:xx:xx:xx (802.3)
? (132.147.103.9) at xx:xx:xx:xx:xx:xx (802.3)
Stopping and starting the interface
ifconfig net0 down
/etc/tcp stop – daemons stopped, NIC is UP
/etc/tcp shutdown – everything down
/etc/nd stop start
34
Reporting Problems
crash
Primarily used for panic analysis
/var/spool/dump
dumpmemory to generate a crash dump on a live system
crash –a <dumpfile>; will produce a listing suitable for SCO support
provide dumpfile, /stand/unix, all of /etc/conf/mod.d, /usr/sbin/crash
Useful crash commands
ps, as, trace, u, eng, od, addstruct, help
walk data structures using od
od –f
ksh style history buffer
lsof, can save hours of fun on a live system
35
Reporting Problems
When reporting problems to support:
Establish a reproducible case (if possible)
Save any crash related files
Note stack trace, crash -a
Save system log files
/var/adm/
Include hardware specs when filing a bug
run sysinfo
Be aware of changes made to /stand/boot
bootparam
36
Q&A
37