[sles-beta] SLES11-SP3 x86_64 RC1 Problem Storage SAN lpfc module FCF discovery

urs.frey at post.ch urs.frey at post.ch
Tue May 7 04:06:04 MDT 2013


Hi
I am testing SLES11-SP3 RC1 x86_64 on a HP Pr0liant Blade BL465cG7 AMD Opteron, attached to SAN EMC VMAx, FCoE, NC551i Emulex OneConnect 10GB CNA

Suddenly during normal operation, I loose a SAN path on my test server.
Instead of a re-instate after coming up of the SAN path again, I can see thousands of messages like this in my /var/log/messages

May  6 21:43:01 h04wwl /usr/sbin/cron[1686]: (root) CMD (/usr/local/scripts/cpqhealth_mon > /dev/null 2>&1)
May  6 21:45:01 h04wwl /usr/sbin/cron[1862]: (root) CMD (/usr/local/scripts/check_multipath.sh > /dev/null 2>&1)
May  6 21:45:01 h04wwl /usr/sbin/cron[1863]: (oracle) CMD (/appl/ora/oraenv/bck/dynarcbck.sh > /dev/null 2>&1)
May  6 21:49:01 h04wwl /usr/sbin/cron[2224]: (root) CMD (/usr/local/scripts/query_patchnix.sh > /dev/null 2>&1)
May  6 21:50:01 h04wwl /usr/sbin/cron[2412]: (root) CMD (/usr/local/scripts/check_multipath.sh > /dev/null 2>&1)
May  6 21:50:01 h04wwl /usr/sbin/cron[2413]: (oracle) CMD (/appl/ora/oraenv/bck/dynarcbck.sh > /dev/null 2>&1)
May  6 21:55:01 h04wwl /usr/sbin/cron[2860]: (root) CMD (/usr/local/scripts/check_multipath.sh > /dev/null 2>&1)
May  6 21:55:01 h04wwl /usr/sbin/cron[2861]: (oracle) CMD (/appl/ora/oraenv/bck/dynarcbck.sh > /dev/null 2>&1)
May  6 21:58:22 h04wwl kernel: [621025.614280] lpfc 0000:04:00.2: 0:3300 In-use FCF (0) modified, perform FCF rediscovery
May  6 21:58:22 h04wwl kernel: [621025.680799] lpfc 0000:04:00.2: 0:2546 New FCF event, evt_tag:x3, index:x0
May  6 21:58:22 h04wwl kernel: [621025.680820] lpfc 0000:04:00.2: 0:3300 In-use FCF (0) modified, perform FCF rediscovery
May  6 21:58:22 h04wwl kernel: [621025.726467] lpfc 0000:04:00.2: 0:2546 New FCF event, evt_tag:x4, index:x0
May  6 21:58:22 h04wwl kernel: [621025.726485] lpfc 0000:04:00.2: 0:3300 In-use FCF (0) modified, perform FCF rediscovery
May  6 21:58:22 h04wwl kernel: [621025.772060] lpfc 0000:04:00.2: 0:2546 New FCF event, evt_tag:x5, index:x0
May  6 21:58:22 h04wwl kernel: [621025.772078] lpfc 0000:04:00.2: 0:3300 In-use FCF (0) modified, perform FCF rediscovery
May  6 21:58:22 h04wwl kernel: [621025.825965] lpfc 0000:04:00.2: 0:2546 New FCF event, evt_tag:x6, index:x0
May  6 21:58:22 h04wwl kernel: [621025.825983] lpfc 0000:04:00.2: 0:3300 In-use FCF (0) modified, perform FCF rediscovery
May  6 21:58:22 h04wwl kernel: [621025.871796] lpfc 0000:04:00.2: 0:2546 New FCF event, evt_tag:x7, index:x0

Upon issuing manually a LIP on the CNA again, the SAN path gets re-detected and reinstated.
I could observe quite a similar behavior also on SLES11-SP3 Beta4.
Why is there no automatic reinstate of the SAN path?
What is does this mean "In-use FCF (0) modified, perform FCF rediscovery".
I mean this should be solved in fully automatic mode, no manual intervention necessary.

This works best on SLES11-SP2, and also worked fine until SLES11-SP3 Beta4
I assume, that we either do have a problem with the lpfc.ko kernel module, or in the SLES11-SP3 RC1 kernel itself

I can reproduce it by simply taking down one CNA switch path and getting it up again.
BUT when disabling one SAN path on the VMAx itself, multipath does work best and does also reinstate without a problem after enabling the SAN path again.

So there must be a different event causing the problem.


Question:
What to set for save operation on SAN?
What is about the lpfc_fcf_failover_policy?


Thank you for your feedback

Best regards


Urs Frey
Die Schweizerische Post
Services
Informationstechnologie
Webergutstrasse 12
3030 Bern (Zollikofen)
Telefon : ++41 (0)58 338 58 70
FAX     : ++41 (0)58 667 30 07
E-Mail:   urs.frey at post.ch<mailto:urs.frey at post.ch>




More information about the sles-beta mailing list