Re: [suse-sles-e] SLES9 Sun X4100 system freeze

From: Alexei_Roudnev (Alexei_Roudnev_at_exigengroup.com)
Date: Mon Oct 09 2006 - 20:08:44 CEST


Message-ID: <0df901c6ebcd$f5bb1760$6f31a8c0@sjc.exigengroup.com>
From: "Alexei_Roudnev" <Alexei_Roudnev@exigengroup.com>
Date: Mon, 9 Oct 2006 11:08:44 -0700
Subject: Re: [suse-sles-e] SLES9 Sun X4100 system freeze

Did you configured system to reboot on panic (few options must be
configured - panic=N > 0, and panic_on_oops)? By default, SLES dont reboot
on panic (which is a bug, of course!)

----- Original Message -----
From: "Chris Puttick" <c.puttick@oxfordarch.co.uk>
To: <suse-sles-e@suse.com>
Sent: Monday, October 09, 2006 5:24 AM
Subject: [suse-sles-e] SLES9 Sun X4100 system freeze

Hi all

Having a little problem with a critical SLES server. Once in a while it
stops. Full on stop without warning - no error messages, no preceding
slowdown, no resources hitting limits. Device still pings on main interface
but not secondaries, which I assume is related to the out of band management
capabilities of the hardware, but does not output anything to VGA, respond
to keyboard, SSH, telnet, etc..

Hardware is Sun X4100 (dual AMD dual core, 8Gb RAM), running SLES 9 x64 SP3
and VMware Server with 6 or so VMs (see how critical this one is...). The
failures have been irregular and fairly lengthy periods of time between them
e.g. the last two were 6 days apart, the previous one a few weeks before
that. The only consistent issue we can identify is the last entries in the
messages log imeediately before the failure:

Sep 30 10:42:34 BRILL kernel: hda: irq timeout: status=0xd0 { Busy }
Sep 30 10:42:34 BRILL kernel: hda: irq timeout: error=0xd0LastFailedSense
0x0d
Sep 30 10:43:04 BRILL kernel: hda: ATAPI reset timed-out, status=0x80
Sep 30 10:43:34 BRILL kernel: ide0: reset timed-out, status=0x80
Sep 30 10:43:34 BRILL kernel: hda: status timeout: status=0x80 { Busy }
Sep 30 10:43:34 BRILL kernel: hda: status timeout: error=0x80LastFailedSense
0x08
Sep 30 10:43:34 BRILL kernel: hda: drive not ready for command
Sep 30 10:44:04 BRILL kernel: hda: ATAPI reset timed-out, status=0x80

Oct 6 09:32:47 BRILL kernel: hda: irq timeout: status=0xd0 { Busy }
Oct 6 09:32:47 BRILL kernel: hda: irq timeout: error=0xd0LastFailedSense
0x0d
Oct 6 09:33:17 BRILL kernel: hda: ATAPI reset timed-out, status=0x80
Oct 6 09:33:47 BRILL kernel: ide0: reset timed-out, status=0x80
Oct 6 09:33:47 BRILL kernel: hda: status timeout: status=0x80 { Busy }
Oct 6 09:33:47 BRILL kernel: hda: status timeout: error=0x80LastFailedSense
0x08
Oct 6 09:33:47 BRILL kernel: hda: drive not ready for command
Oct 6 09:34:17 BRILL kernel: hda: ATAPI reset timed-out, status=0x80

/dev/hda is the DVD ROM drive. However, the DVD drive was empty on both
occasions and there was no (user) reason for the drive to be accessed. The
log was last rotated on 20/09/2006 and these are the only entries in the log
referencing hda or ATAPI. The other possibly associated info is the output
of mount, which shows two devices mounted for the DVD rom, one the DVD
proper (/dev/hda) and one an AMI Virtual CDROM (/dev/sr0):

/dev/sda2 on / type reiserfs (rw,acl,user_xattr)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
tmpfs on /dev/shm type tmpfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sr0 on /media/cdrom type subfs
(ro,nosuid,nodev,fs=cdfss,procuid,iocharset=utf8)
/dev/hda on /media/dvd type subfs
(ro,nosuid,nodev,fs=cdfss,procuid,iocharset=utf8)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sr0 on /media/usb-AmericanMegatrendsInc-VirtualCdromDevice:0:0:0 type
subfs (ro,nosuid,nodev,fs=cdfss,procuid,iocharset=utf8)
/dev/sdj on /eqlsan type ext2 (rw)

There is and has been no disc actually in the drive for over a month.

All suggestions for the source of the problem and or a solution warmly
welcomed!

Regards

Chris

Chris Puttick
CIO
Oxford Archaeology: Exploring the Human Journey
Direct: +44 (0)1865 263 818
Switchboard: +44 (0)1865 263 800
Mobile: +44 (0)7914 402 907
http://thehumanjourney.net

This message has been scanned for viruses by BlackSpider MailControl -
www.blackspider.com

---------------------------------------------------------------------
To unsubscribe, e-mail: suse-sles-e-unsubscribe@suse.com
For additional commands, e-mail: suse-sles-e-help@suse.com

---------------------------------------------------------------------
To unsubscribe, e-mail: suse-sles-e-unsubscribe@suse.com
For additional commands, e-mail: suse-sles-e-help@suse.com



This archive was generated by hypermail 2.1.7 : Mon Oct 09 2006 - 20:05:25 CEST