From bugzilla_noreply at suse.com Tue May 3 17:39:07 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Tue, 03 May 2022 17:39:07 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c34 --- Comment #34 from Jordan Causey --- Apologies for the delay in updates on this bug. The customer tested the PTF in late April and had a similar issue where the host was unable to see attached storage (multipath devices) after upgrade. Our suspicion is that this behavior could be related to a lack of multipath kernel support (dm-multipath.ko), due to the dracut "--no-kernel" param (which results in an initrd built without kernel modules). If this hypothesis seems reasonable, we'd like to request a new SLES15-Migration PTF with the suse_migration_services scripts amended to remove this flag ("--no-kernel"). From what I can tell, this param is set in suse_migration_services/units/regenerate_initrd.py in the run_dracut function. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Wed May 4 16:21:26 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Wed, 04 May 2022 16:21:26 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c35 --- Comment #35 from Martin Wilck --- I have never used --no-kernel. Indeed, the dm-multipath module is necessary for multpath support. I don't know what happens with required modules if the cusomer specifies --no-kernel. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Wed May 4 17:09:49 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Wed, 04 May 2022 17:09:49 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c36 --- Comment #36 from Jordan Causey --- (In reply to Martin Wilck from comment #35) > I have never used --no-kernel. Indeed, the dm-multipath module is necessary > for multpath support. I don't know what happens with required modules if the > cusomer specifies --no-kernel. In this case, those dracut params are being set by some python scripts implemented as part of the Distribution Migration System (DMS) RPMs. This is a bit of a unique scenario, as DMS is typically only used for cloud-hosted VMs, which would usually not implement multipath. In this case, these are bare-metal servers that are managed by MSFT in Azure datacenters. These hosts have their own dedicated on-site storage (NetApp), hence the need for multipath support. The dracut man describes the "--no-kernel" param as: --no-kernel do not install kernel drivers and firmware files So that could definitely be the culprit here. Should be pretty easy to test with a PTF, but this is not a change that would likely be beneficial to other DMS customers moving forward. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Wed May 4 20:18:56 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Wed, 04 May 2022 20:18:56 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c37 --- Comment #37 from Martin Wilck --- (In reply to Jordan Causey from comment #36) > In this case, those dracut params are being set by some python scripts > implemented as part of the Distribution Migration System (DMS) RPMs. This is > a bit of a unique scenario, as DMS is typically only used for cloud-hosted > VMs, which would usually not implement multipath. Stills strange, because I can hardly imagine a system that needs no single kernel module for booting these days. I did this 25 years ago when I compiled custom kernels for my systems. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Wed May 4 20:35:28 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Wed, 04 May 2022 20:35:28 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c38 --- Comment #38 from Robert Schweikert --- Agreed, it appears weird that "--no-kernel" would drop things that are considered required. Anyway based on the doc it shold be save to drop the "--no-kernel" option form the commandline for dracut, we do want kernel modules after all. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Wed May 4 20:39:01 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Wed, 04 May 2022 20:39:01 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c39 --- Comment #39 from Martin Wilck --- If you want a minimal initrd, you could experiment with "--hostonly=strict". It's also considered a bit dangerous, but it _should_ identify required kernel modules. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Wed May 4 20:40:44 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Wed, 04 May 2022 20:40:44 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c40 --- Comment #40 from Robert Schweikert --- For the DMS implementation we want to cast as wide a net as possible and build the biggest initrd possible with the tools that are already installed on the system. Thanks for the info. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Wed May 4 21:03:09 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Wed, 04 May 2022 21:03:09 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c41 --- Comment #41 from Jordan Causey --- (In reply to Robert Schweikert from comment #40) > For the DMS implementation we want to cast as wide a net as possible and > build the biggest initrd possible with the tools that are already installed > on the system. > > Thanks for the info. Sorry for any confusion I may have caused. I checked and the DMS Python code where the dracut "--no-kernel" param is set (suse_migration_services/units/regenerate_initrd.py) and that "regenerate_initrd.py" code only appears to be present in the PTF created specifically for this customer. That code is not used for standard migrations. I assume standard DMS migrations would have full kernel module support enabled. I'm not quite sure why the "--no-kernel" param was added initially to the PTF, however I think we're in agreement that it would cause the kernel to not see multipath devices. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Thu May 5 16:48:26 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Thu, 05 May 2022 16:48:26 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c42 --- Comment #42 from Bogdano Arendartchuk --- Keith's project was updated since the PTF was published. As I'm not sure the changes are intended for this bug, I've added the following patches to the older version: ------------------------------------------------------------------- Thu May 5 16:02:10 UTC 2022 - Bogdano Arendartchuk - Added 243.patch: + Remove --no-kernel from host independant initrd - Added 0001-regenerate_initrd-Do-not-log-no-kernel.patch + regenerate_initrd: Do not log --no-kernel https://ptf.suse.com/f8ce1f65e4eea82157e5797fe7271933/sles15-sp1/23947/x86_64/20220505/ -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Thu May 5 17:01:19 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Thu, 05 May 2022 17:01:19 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c43 --- Comment #43 from Robert Schweikert --- The "--no-kernel" option was added based on comment #17 in this bug. It then slipped through my review of the code changes in DMS and I merged a follow up PR to DMS that removes "--no-kernel" from DMS yesterday. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Fri May 6 15:15:01 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Fri, 06 May 2022 15:15:01 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c44 --- Comment #44 from Jordan Causey --- Thanks for this. The customer has scheduled a time for 5/13/22 to test an upgrade with this PTF. I'll provide a status update immediately after that effort. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Mon May 16 20:17:52 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Mon, 16 May 2022 20:17:52 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c45 --- Comment #45 from Keith Berger --- (In reply to Jordan Causey from comment #44) > Thanks for this. The customer has scheduled a time for 5/13/22 to test an > upgrade with this PTF. I'll provide a status update immediately after that > effort. Any updates Jordan? -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Mon May 16 21:21:57 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Mon, 16 May 2022 21:21:57 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c46 --- Comment #46 from Jordan Causey --- (In reply to Keith Berger from comment #45) > (In reply to Jordan Causey from comment #44) > > Thanks for this. The customer has scheduled a time for 5/13/22 to test an > > upgrade with this PTF. I'll provide a status update immediately after that > > effort. > > Any updates Jordan? Unfortunately, just as we were set to begin another test this past Friday, the customer encountered a production outage and pulled all resources into that troubleshooting effort. We had to reschedule our upgrade test until this coming Thursday (5/19) as a result. I'll update the bug with the outcome of that test as soon as it completes. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla_noreply at suse.com Tue May 24 16:51:10 2022 From: bugzilla_noreply at suse.com (bugzilla_noreply at suse.com) Date: Tue, 24 May 2022 16:51:10 +0000 Subject: [Bug 1194320] L3: sles12sp4 -> sles15sp1 upgrade of HLI in Azure results in non-bootable system In-Reply-To: References: Message-ID: https://bugzilla.suse.com/show_bug.cgi?id=1194320 https://bugzilla.suse.com/show_bug.cgi?id=1194320#c47 --- Comment #47 from Jordan Causey --- We performed a successful 15 SP1 upgrade today using the PTF provided. We ran into a couple unrelated snags with service starts that we were able to resolve, but nothing related to the DMS migration itself. The customer is going to perform another end-to-end test next week to validate our service fixes, but I think we can say this issue is resolved. Thanks for everyone's assistance (and patience) in working through this. -- You are receiving this mail because: You are on the CC list for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: