Re: [suse-sles-e] Kernel bug causing NFS block

From: Reinhard Weh (reinhard.weh_at_baeurer.de)
Date: Fri Jul 14 2006 - 23:07:24 CEST


Message-ID: <44B8078C.4010403@baeurer.de>
Date: Fri, 14 Jul 2006 23:07:24 +0200
From: Reinhard Weh <reinhard.weh@baeurer.de>
Subject: Re: [suse-sles-e] Kernel bug causing NFS block

Hello Johan,

on your SLES you use Kernle Version 2.6.5-7.97-bigsmp , this is from Original CD !

Now on SLES9 we use ServicePack 3 + you Updates an there is kernel-smp-2.6.5-7.267-bigsmp.

an some NFS Bugs fixed.

regards
reinhard

Johan Kielbaey schrieb:
> Hello,
>
> We have a server appears to loose its NFS connection to 2 of its 4 NFS shares.
>
> r200a-back:/vol/cits /cits-vol nfs rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,intr 0 0
> r200a-back:/vol/share02/cacheprod /cits-vol/cache nfs rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,intr 0 0
> r200a-back:/vol/cits1 /cits1-vol nfs rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,intr 0 0
> r200a-back:/vol/share04/storecits /cits-vol/store nfs rw,bg,vers=3,tcp,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,intr 0 0
>
> Even though the shares are mounted as interruptable, they block. After issuing 3-4 commands that block on the blocked NFS shares (df, ls, ...) the server freezes and hard reset is needed. Other NFS shares remain working until server freeze of course. Even the console doesn't respond anymore.
>
> In the message file I saw a kernel stack dump.
>
> Jul 14 11:10:45 s30235 kernel: ------------[ cut here ]------------
> Jul 14 11:10:45 s30235 kernel: kernel BUG at kernel/workqueue.c:170!
> Jul 14 11:10:45 s30235 kernel: invalid operand: 0000 [#1]
> Jul 14 11:10:45 s30235 kernel: SMP
> Jul 14 11:10:45 s30235 kernel: CPU: 0
> Jul 14 11:10:45 s30235 kernel: EIP: 0060:[<c013836d>] Not tainted
> Jul 14 11:10:45 s30235 kernel: EFLAGS: 00010207 (2.6.5-7.97-bigsmp)
> Jul 14 11:10:45 s30235 kernel: EIP is at worker_thread+0x21d/0x230
> Jul 14 11:10:45 s30235 kernel: eax: cde56ae8 ebx: 00000293 ecx: cde56ae4 edx: cde56ae8
> Jul 14 11:10:45 s30235 kernel: esi: f7f88000 edi: cde56ae8 ebp: cde56a00 esp: f7f97f3c
> Jul 14 11:10:45 s30235 kernel: ds: 007b es: 007b ss: 0068
> Jul 14 11:10:45 s30235 kernel: Process events/0 (pid: 10, threadinfo=f7f96000 task=cde82c60)
> Jul 14 11:10:45 s30235 kernel: Stack: 00000002 00000001 00000000 f7f88020 c0312fc0 f7f96000 f7f8800c ffffffff
> Jul 14 11:10:45 s30235 kernel: ffffffff 00000001 00000000 c01225e0 00010000 00000000 00001923 0549586a
> Jul 14 11:10:45 s30235 kernel: 00000000 00000000 cde82c60 c01225e0 00100100 00200200 cde95f64 f7f97fac
> Jul 14 11:10:45 s30235 kernel: Call Trace:
> Jul 14 11:10:45 s30235 kernel: [<c0312fc0>] xprt_socket_autoclose+0x0/0x40
> Jul 14 11:10:45 s30235 kernel: [<c01225e0>] default_wake_function+0x0/0x10
> Jul 14 11:10:45 s30235 kernel: [<c01225e0>] default_wake_function+0x0/0x10
> Jul 14 11:10:45 s30235 kernel: [<c0138150>] worker_thread+0x0/0x230
> Jul 14 11:10:45 s30235 kernel: [<c013bf79>] kthread+0xf9/0x12d
> Jul 14 11:10:45 s30235 kernel: [<c013be80>] kthread+0x0/0x12d
> Jul 14 11:10:45 s30235 kernel: [<c0107005>] kernel_thread_helper+0x5/0x10
> Jul 14 11:10:45 s30235 kernel:
> Jul 14 11:10:45 s30235 kernel: Code: 0f 0b aa 00 bb 79 34 c0 e9 50 ff ff ff 8d b6 00 00 00 00 83
> Jul 14 11:11:45 s30235 kernel: <3>RPC: error 5 connecting to server 10.192.101.242
>
> After this dump more RPC errors are shown.
>
> Jul 14 11:34:45 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 11:44:45 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 12:38:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 13:32:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 13:43:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 13:51:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 14:04:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 14:22:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 14:37:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 14:42:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 14:52:44 s30235 kernel: RPC: error 5 connecting to server 10.192.101.242
> Jul 14 15:44:46 s30235 kernel: nfs_statfs: statfs error = 512
>
> Prior to the kernel stack there are no entries indicating NFS errors.
>
> Has anyone else encountered this?
>
> Thanks,
>
> Johan
>

---------------------------------------------------------------------
To unsubscribe, e-mail: suse-sles-e-unsubscribe@suse.com
For additional commands, e-mail: suse-sles-e-help@suse.com



This archive was generated by hypermail 2.1.7 : Fri Jul 14 2006 - 23:07:39 CEST