From bugzilla_noreply at suse.com Tue May 3 17:39:07 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Tue, 03 May 2022 17:39:07 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c34
--- Comment #34 from Jordan Causey ---
Apologies for the delay in updates on this bug. The customer tested the PTF in
late April and had a similar issue where the host was unable to see attached
storage (multipath devices) after upgrade.
Our suspicion is that this behavior could be related to a lack of multipath
kernel support (dm-multipath.ko), due to the dracut "--no-kernel" param (which
results in an initrd built without kernel modules).
If this hypothesis seems reasonable, we'd like to request a new
SLES15-Migration PTF with the suse_migration_services scripts amended to remove
this flag ("--no-kernel"). From what I can tell, this param is set in
suse_migration_services/units/regenerate_initrd.py in the run_dracut function.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Wed May 4 16:21:26 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Wed, 04 May 2022 16:21:26 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c35
--- Comment #35 from Martin Wilck ---
I have never used --no-kernel. Indeed, the dm-multipath module is necessary for
multpath support. I don't know what happens with required modules if the
cusomer specifies --no-kernel.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Wed May 4 17:09:49 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Wed, 04 May 2022 17:09:49 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c36
--- Comment #36 from Jordan Causey ---
(In reply to Martin Wilck from comment #35)
> I have never used --no-kernel. Indeed, the dm-multipath module is necessary
> for multpath support. I don't know what happens with required modules if the
> cusomer specifies --no-kernel.
In this case, those dracut params are being set by some python scripts
implemented as part of the Distribution Migration System (DMS) RPMs. This is a
bit of a unique scenario, as DMS is typically only used for cloud-hosted VMs,
which would usually not implement multipath.
In this case, these are bare-metal servers that are managed by MSFT in Azure
datacenters. These hosts have their own dedicated on-site storage (NetApp),
hence the need for multipath support.
The dracut man describes the "--no-kernel" param as:
--no-kernel
do not install kernel drivers and firmware files
So that could definitely be the culprit here. Should be pretty easy to test
with a PTF, but this is not a change that would likely be beneficial to other
DMS customers moving forward.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Wed May 4 20:18:56 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Wed, 04 May 2022 20:18:56 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c37
--- Comment #37 from Martin Wilck ---
(In reply to Jordan Causey from comment #36)
> In this case, those dracut params are being set by some python scripts
> implemented as part of the Distribution Migration System (DMS) RPMs. This is
> a bit of a unique scenario, as DMS is typically only used for cloud-hosted
> VMs, which would usually not implement multipath.
Stills strange, because I can hardly imagine a system that needs no single
kernel module for booting these days. I did this 25 years ago when I compiled
custom kernels for my systems.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Wed May 4 20:35:28 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Wed, 04 May 2022 20:35:28 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c38
--- Comment #38 from Robert Schweikert ---
Agreed, it appears weird that "--no-kernel" would drop things that are
considered required. Anyway based on the doc it shold be save to drop the
"--no-kernel" option form the commandline for dracut, we do want kernel modules
after all.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Wed May 4 20:39:01 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Wed, 04 May 2022 20:39:01 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c39
--- Comment #39 from Martin Wilck ---
If you want a minimal initrd, you could experiment with "--hostonly=strict".
It's also considered a bit dangerous, but it _should_ identify required kernel
modules.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Wed May 4 20:40:44 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Wed, 04 May 2022 20:40:44 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c40
--- Comment #40 from Robert Schweikert ---
For the DMS implementation we want to cast as wide a net as possible and build
the biggest initrd possible with the tools that are already installed on the
system.
Thanks for the info.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Wed May 4 21:03:09 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Wed, 04 May 2022 21:03:09 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c41
--- Comment #41 from Jordan Causey ---
(In reply to Robert Schweikert from comment #40)
> For the DMS implementation we want to cast as wide a net as possible and
> build the biggest initrd possible with the tools that are already installed
> on the system.
>
> Thanks for the info.
Sorry for any confusion I may have caused. I checked and the DMS Python code
where the dracut "--no-kernel" param is set
(suse_migration_services/units/regenerate_initrd.py) and that
"regenerate_initrd.py" code only appears to be present in the PTF created
specifically for this customer. That code is not used for standard migrations.
I assume standard DMS migrations would have full kernel module support enabled.
I'm not quite sure why the "--no-kernel" param was added initially to the PTF,
however I think we're in agreement that it would cause the kernel to not see
multipath devices.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Thu May 5 16:48:26 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Thu, 05 May 2022 16:48:26 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c42
--- Comment #42 from Bogdano Arendartchuk ---
Keith's project was updated since the PTF was published. As I'm not sure
the changes are intended for this bug, I've added the following patches to
the older version:
-------------------------------------------------------------------
Thu May 5 16:02:10 UTC 2022 - Bogdano Arendartchuk
- Added 243.patch:
+ Remove --no-kernel from host independant initrd
- Added 0001-regenerate_initrd-Do-not-log-no-kernel.patch
+ regenerate_initrd: Do not log --no-kernel
https://ptf.suse.com/f8ce1f65e4eea82157e5797fe7271933/sles15-sp1/23947/x86_64/20220505/
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Thu May 5 17:01:19 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Thu, 05 May 2022 17:01:19 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c43
--- Comment #43 from Robert Schweikert ---
The "--no-kernel" option was added based on comment #17 in this bug. It then
slipped through my review of the code changes in DMS and I merged a follow up
PR to DMS that removes "--no-kernel" from DMS yesterday.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Fri May 6 15:15:01 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Fri, 06 May 2022 15:15:01 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c44
--- Comment #44 from Jordan Causey ---
Thanks for this. The customer has scheduled a time for 5/13/22 to test an
upgrade with this PTF. I'll provide a status update immediately after that
effort.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Mon May 16 20:17:52 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Mon, 16 May 2022 20:17:52 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c45
--- Comment #45 from Keith Berger ---
(In reply to Jordan Causey from comment #44)
> Thanks for this. The customer has scheduled a time for 5/13/22 to test an
> upgrade with this PTF. I'll provide a status update immediately after that
> effort.
Any updates Jordan?
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Mon May 16 21:21:57 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Mon, 16 May 2022 21:21:57 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c46
--- Comment #46 from Jordan Causey ---
(In reply to Keith Berger from comment #45)
> (In reply to Jordan Causey from comment #44)
> > Thanks for this. The customer has scheduled a time for 5/13/22 to test an
> > upgrade with this PTF. I'll provide a status update immediately after that
> > effort.
>
> Any updates Jordan?
Unfortunately, just as we were set to begin another test this past Friday, the
customer encountered a production outage and pulled all resources into that
troubleshooting effort.
We had to reschedule our upgrade test until this coming Thursday (5/19) as a
result. I'll update the bug with the outcome of that test as soon as it
completes.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla_noreply at suse.com Tue May 24 16:51:10 2022
From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com)
Date: Tue, 24 May 2022 16:51:10 +0000
Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure
results in non-bootable system
In-Reply-To:
References:
Message-ID:
https://bugzilla.suse.com/show_bug.cgi?id=1194320
https://bugzilla.suse.com/show_bug.cgi?id=1194320#c47
--- Comment #47 from Jordan Causey ---
We performed a successful 15 SP1 upgrade today using the PTF provided. We ran
into a couple unrelated snags with service starts that we were able to resolve,
but nothing related to the DMS migration itself.
The customer is going to perform another end-to-end test next week to validate
our service fixes, but I think we can say this issue is resolved.
Thanks for everyone's assistance (and patience) in working through this.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: