<div class="container">
<h1>Recommended update for slurm</h1>
<table class="table table-striped table-bordered">
<tbody>
<tr>
<th>Announcement ID:</th>
<td>SUSE-RU-2023:3759-1</td>
</tr>
<tr>
<th>Rating:</th>
<td>moderate</td>
</tr>
<tr>
<th>References:</th>
<td>
<ul>
<li style="display: inline;">
<a href="https://bugzilla.suse.com/show_bug.cgi?id=1214983">#1214983</a>
</li>
</ul>
</td>
</tr>
<tr>
<th>Affected Products:</th>
<td>
<ul class="list-group">
<li class="list-group-item">HPC Module 15-SP5</li>
<li class="list-group-item">openSUSE Leap 15.5</li>
<li class="list-group-item">SUSE Linux Enterprise Desktop 15 SP5</li>
<li class="list-group-item">SUSE Linux Enterprise High Performance Computing 15 SP5</li>
<li class="list-group-item">SUSE Linux Enterprise Micro 5.5</li>
<li class="list-group-item">SUSE Linux Enterprise Real Time 15 SP5</li>
<li class="list-group-item">SUSE Linux Enterprise Server 15 SP5</li>
<li class="list-group-item">SUSE Linux Enterprise Server for SAP Applications 15 SP5</li>
<li class="list-group-item">SUSE Package Hub 15 15-SP5</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>An update that has one fix can now be installed.</p>
<h2>Description:</h2>
<p>This update for slurm fixes the following issues:</p>
<ul>
<li>Updated to 23.02.4 with the following changes:</li>
<li>Bug Fixes:<ul>
<li>Fix main scheduler loop not starting after a failover to backup
controller. Avoid slurmctld segfault when specifying
<code>AccountingStorageExternalHost</code> (bsc#1214983).</li>
<li>Fix sbatch return code when <code>--wait</code> is requested on a job array.</li>
<li>Fix collected <code>GPUUtilization</code> values for <code>acct_gather_profile</code> plugins.</li>
<li>Fix <code>slurmrestd</code> handling of job hold/release operations.</li>
<li>Fix step running indefinitely when slurmctld takes more than
<code>MessageTimeout</code> to respond. Now, <code>slurmctld</code> will cancel the step when
detected, preventing following steps from getting stuck waiting for
resources to be released.</li>
<li>Fix regression to make <code>job_desc.min_cpus</code> accurate again in <code>job_submit</code>
when requesting a job with <code>--ntasks-per-node</code>.</li>
<li>Fix handling of <code>ArrayTaskThrottle</code> in backfill.</li>
<li>Fix regression in 23.02.2 when checking gres state on <code>slurmctld</code>
startup or reconfigure. Gres changes in the configuration were not
updated on slurmctld startup. On startup or reconfigure, these messages
were present in the log: <code>"error: Attempt to change gres/gpu Count</code>".</li>
<li>Fix potential double count of gres when dealing with limits.</li>
<li>Fix <code>slurmstepd</code> segfault when <code>ContainerPath</code> is not set in <code>oci.conf</code></li>
<li>Fixed an issue where jobs requesting licenses were incorrectly rejected.</li>
<li><code>scrontab</code> - Fix cutting off the final character of quoted variables.</li>
<li><code>smail</code> - Fix issues where e-mails at job completion were not being sent.</li>
<li><code>scontrol/slurmctld</code> - fix comma parsing when updating a reservation's
nodes.</li>
<li>Fix <code>--gpu-bind=single binding</code> tasks to wrong gpus, leading to some gpus
having more tasks than they should and other gpus being unused.</li>
<li>Fix regression in 23.02 that causes slurmstepd to crash when <code>srun</code>
requests more than <code>TreeWidth</code> nodes in a step and uses the pmi2 or
pmix plugin.</li>
<li><code>job_container/tmpfs</code> - Fix <code>%h</code> and <code>%n</code> substitution in <code>BasePath</code>
where <code>%h</code> was substituted as the NodeName instead of the hostname,
and %n was substituted as an empty string.</li>
<li>Fix regression where <code>--cpu-bind=verbose</code> would override
<code>TaskPluginParam</code>.</li>
<li><code>scancel</code> - Fix <code>--clusters/-M</code> for federations. Only filtered jobs
(e.g. <code>-A</code>, <code>-u</code>, <code>-p</code>, etc.) from the specified clusters will be
canceled, rather than all jobs in the federation. Specific jobids
will still be routed to the origin cluster for cancellation.</li>
</ul>
</li>
<li>Other changes:<ul>
<li>Make spank <code>S_JOB_ARGV</code> item value hold the requested command <code>argv</code>
instead of the <code>srun --bcast</code> value when <code>--bcast</code> requested (only in
local context).</li>
<li><code>scontrol</code> - Permit changes to StdErr and StdIn for pending jobs.</li>
<li><code>scontrol</code> - Reset <code>std</code>{<code>err</code>,<code>in</code>,<code>out</code>} when set to empty string.</li>
<li><code>slurmrestd</code> - mark environment as a required field for job submission
descriptions.</li>
<li><code>slurmrestd</code> - avoid dumping null in OpenAPI schema required fields.</li>
<li><code>data_parser/v0.0.39</code> - avoid rejecting valid <code>memory_per_node</code> formatted
as dictionary provided with a job description.</li>
<li><code>data_parser/v0.0.39</code> - avoid rejecting valid <code>memory_per_cpu</code> formatted
as dictionary provided with a job description.</li>
<li><code>slurmrestd</code> - Return HTTP error code 404 when job query fails.</li>
<li><code>slurmrestd</code> - Add return schema to error response to job and license
query.</li>
<li>Change the log message warning for rate limited users from debug to
verbose.</li>
<li><code>cgroup/v2</code> - Avoid capturing log output for ebpf when constraining
devices,
as this can lead to inadvertent failure if the log buffer is too small.</li>
<li>Added error message when attempting to use sattach on batch or extern
steps.</li>
<li>Reject job <code>ArrayTaskThrottle</code> update requests from unprivileged users.</li>
<li><code>data_parser/v0.0.39</code> - populate description fields of property objects
in generated OpenAPI specifications where defined.</li>
<li><code>slurmstepd</code> - Avoid segfault caused by <code>ContainerPath</code> not being
terminated by <code>/</code> in <code>oci.conf</code>.</li>
<li><code>data_parser/v0.0.39</code> - Change <code>v0.0.39_job_info</code> response to tag
<code>exit_code</code> field as being complex instead of only an unsigned integer.</li>
</ul>
</li>
<li>Updated to 23.02.3 with the following changes:</li>
<li>Bug Fixes:<ul>
<li><code>slurmctld</code> - Fix backup slurmctld crash when it takes control
multiple times.</li>
<li>Fix regression in 23.02.2 that ignored the partition <code>DefCpuPerGPU</code>
setting on the first pass of scheduling a job requesting
<code>--gpus --ntasks</code>.</li>
<li><code>srun</code> - fix issue creating regular and interactive steps because
environment variables were incorrectly set on non-HetSteps.</li>
<li>Fix dynamic nodes getting stuck in allocated states when reconfiguring.</li>
<li>Fix regression in 23.02.2 that set the <code>SLURM_NTASKS</code> environment
variable in sbatch jobs from <code>--ntasks-per-node</code> when <code>--ntasks</code> was not
requested.</li>
<li>Fix regression in 23.02 that caused sbatch jobs to set the wrong number
of tasks when requesting <code>--ntasks-per-node</code> without <code>--ntasks</code>, and also
requesting one of the following options: <code>--sockets-per-node</code>,
<code>--cores-per-socket</code>, <code>--threads-per-core</code> (or <code>--hint=nomultithread</code>),
or <code>-B,--extra-node-info</code>.</li>
<li>Fix double counting suspended job counts on nodes when reconfiguring,
which prevented nodes with suspended jobs from being powered down or
rebooted once the jobs completed.</li>
<li>Fix backfill not scheduling jobs submitted with <code>--prefer</code> and
<code>--constraint</code> properly.</li>
<li>mpi/pmix - fix regression introduced in 23.02.2 which caused PMIx shmem
backed files permissions to be incorrect.</li>
<li>api/submit - fix memory leaks when submission of batch regular jobs
or batch HetJobs fails (response data is a return code).</li>
<li>Fix regression in 23.02 leading to error() messages being sent at <code>INFO</code>
instead of <code>ERR</code> in syslog.</li>
<li>Fix <code>TresUsageIn[Tot|Ave]</code> calculation for <code>gres/gpumem</code> and
<code>gres/gpuutil</code>.</li>
<li>Fix issue in the gpu plugins where gpu frequencies would only be set if
both gpu memory and gpu frequencies were set, while one or the other
suffices.</li>
<li>Fix reservations group ACL's not working with the root group.</li>
<li>Fix updating a job with a ReqNodeList greater than the job's node count.</li>
<li>Fix inadvertent permission denied error for <code>--task-prolog</code> and
<code>--task-epilog</code> with filesystems mounted with <code>root_squash</code>.</li>
<li>Fix missing detailed cpu and gres information in json/yaml output from
<code>scontrol</code>, <code>squeue</code> and <code>sinfo</code>.</li>
<li>Fix regression in 23.02 that causes a failure to allocate job steps that
request <code>--cpus-per-gpu</code> and gpus with types.</li>
<li>Fix potentially waiting indefinitely for a defunct process to finish,
which affects various scripts including <code>Prolog</code> and <code>Epilog</code>. This could
have various symptoms, such as jobs getting stuck in a completing state.</li>
<li>Fix losing list of reservations on job when updating job with list of
reservations and restarting the controller.</li>
<li>Fix nodes resuming after down and drain state update requests from
clients older than 23.02.</li>
<li>Fix advanced reservation creation/update when an association that should
have access to it is composed with partition(s).</li>
<li>Fix job layout calculations with <code>--ntasks-per-gpu</code>, especially when
<code>--nodes</code> has not been explicitly provided.</li>
<li>Fix X11 forwarding for jobs submitted from the slurmctld host.</li>
<li>When a job requests <code>--no-kill</code> and one or more nodes fail during the
job, fix subsequent job steps unable to use some of the remaining
resources allocated to the job.</li>
<li>Fix shared gres allocation when using <code>--tres-per-task</code> with tasks that
span multiple sockets.</li>
<li><code>auth/jwt</code> - Fix memory leak.</li>
</ul>
</li>
<li>Other changes:<ul>
<li><code>openapi/dbv0.0.39/users</code> - If a default account update failed, resulting
in a no-op, the query returned success without any warning. Now a warning
is sent back to the client that the default account wasn't modified.</li>
<li>Avoid job write lock when nodes are dynamically added/removed.</li>
<li><code>burst_buffer/lua</code> - allow jobs to get scheduled sooner after
<code>slurm_bb_data_in</code> completes.</li>
<li><code>openapi/v0.0.39</code> - fix memory leak in <code>_job_post_het_submit()</code>.</li>
<li>Avoid possible <code>slurmctld</code> segfault caused by race condition with already
completed <code>slurmdbd_conn</code> connections.</li>
<li><code>Slurmdbd.conf</code> checks included conf files for 0600 permissions</li>
<li><code>slurmrestd</code> - fix regression "oversubscribe" fields were removed from
job descriptions and submissions from v0.0.39 end points.</li>
<li><code>accounting_storage/mysql</code> - Query for indiviual QOS correctly when you
have more than 10.</li>
<li>Add warning message about ignoring <code>--tres-per-tasks=license</code> when used
on a step.</li>
<li><code>sshare</code> - Fix command to work when using <code>priority/basic</code>.</li>
<li>Avoid loading <code>cli_filter</code> plugins outside of <code>salloc</code>/<code>sbatch</code>/<code>scron</code>/
<code>srun</code>. This fixes a number of missing symbol problems that can manifest
for executables linked against libslurm (and not <code>libslurmfull</code>).</li>
<li>Allow cloud_reg_addrs to update dynamically registered node's addrs on
subsequent registrations.</li>
<li>Revert a change in 22.05.5 that prevented tasks from sharing a core if
<code>--cpus-per-task</code> > threads per core, but caused incorrect accounting and
cpu binding. Instead, <code>--ntasks-per-core=1</code> may be requested to prevent
tasks from sharing a core.</li>
<li>Correctly send <code>assoc_mgr</code> lock to mcs plugin.</li>
<li>Avoid unnecessary <code>gres/gpumem</code> and <code>gres/gpuutil</code> <code>TRES</code> position
lookups.</li>
<li><code>sacct</code> - when printing <code>PLANNED</code> time, use end time instead of start
time for jobs cancelled before they started.</li>
<li>Hold the job with "<code>(Reservation ... invalid)</code>" state reason if the
reservation is not usable by the job.</li>
<li><code>sbatch</code> - Added new <code>--export=NIL</code> option.</li>
</ul>
</li>
</ul>
<h2>Patch Instructions:</h2>
<p>
To install this SUSE update use the SUSE recommended
installation methods like YaST online_update or "zypper patch".<br/>
Alternatively you can run the command listed for your product:
</p>
<ul class="list-group">
<li class="list-group-item">
openSUSE Leap 15.5
<br/>
<code>zypper in -t patch SUSE-2023-3759=1 openSUSE-SLE-15.5-2023-3759=1</code>
</li>
<li class="list-group-item">
HPC Module 15-SP5
<br/>
<code>zypper in -t patch SUSE-SLE-Module-HPC-15-SP5-2023-3759=1</code>
</li>
<li class="list-group-item">
SUSE Package Hub 15 15-SP5
<br/>
<code>zypper in -t patch SUSE-SLE-Module-Packagehub-Subpackages-15-SP5-2023-3759=1</code>
</li>
</ul>
<h2>Package List:</h2>
<ul>
<li>
openSUSE Leap 15.5 (aarch64 ppc64le s390x x86_64)
<ul>
<li>perl-slurm-23.02.4-150500.5.6.1</li>
<li>slurm-cray-23.02.4-150500.5.6.1</li>
<li>libslurm39-23.02.4-150500.5.6.1</li>
<li>slurm-node-23.02.4-150500.5.6.1</li>
<li>slurm-plugins-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-rest-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-sql-23.02.4-150500.5.6.1</li>
<li>slurm-auth-none-23.02.4-150500.5.6.1</li>
<li>slurm-testsuite-23.02.4-150500.5.6.1</li>
<li>slurm-pam_slurm-23.02.4-150500.5.6.1</li>
<li>slurm-devel-23.02.4-150500.5.6.1</li>
<li>slurm-23.02.4-150500.5.6.1</li>
<li>slurm-slurmdbd-debuginfo-23.02.4-150500.5.6.1</li>
<li>libslurm39-debuginfo-23.02.4-150500.5.6.1</li>
<li>libnss_slurm2-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-slurmdbd-23.02.4-150500.5.6.1</li>
<li>slurm-munge-23.02.4-150500.5.6.1</li>
<li>slurm-sview-23.02.4-150500.5.6.1</li>
<li>slurm-torque-23.02.4-150500.5.6.1</li>
<li>slurm-sql-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-cray-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-torque-debuginfo-23.02.4-150500.5.6.1</li>
<li>libpmi0-23.02.4-150500.5.6.1</li>
<li>slurm-auth-none-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-debugsource-23.02.4-150500.5.6.1</li>
<li>libpmi0-debuginfo-23.02.4-150500.5.6.1</li>
<li>libnss_slurm2-23.02.4-150500.5.6.1</li>
<li>slurm-pam_slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-hdf5-23.02.4-150500.5.6.1</li>
<li>slurm-node-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-lua-debuginfo-23.02.4-150500.5.6.1</li>
<li>perl-slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-sview-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-plugins-23.02.4-150500.5.6.1</li>
<li>slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-munge-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-hdf5-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-plugin-ext-sensors-rrd-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-rest-23.02.4-150500.5.6.1</li>
<li>slurm-plugin-ext-sensors-rrd-23.02.4-150500.5.6.1</li>
<li>slurm-lua-23.02.4-150500.5.6.1</li>
</ul>
</li>
<li>
openSUSE Leap 15.5 (noarch)
<ul>
<li>slurm-doc-23.02.4-150500.5.6.1</li>
<li>slurm-openlava-23.02.4-150500.5.6.1</li>
<li>slurm-config-23.02.4-150500.5.6.1</li>
<li>slurm-config-man-23.02.4-150500.5.6.1</li>
<li>slurm-webdoc-23.02.4-150500.5.6.1</li>
<li>slurm-seff-23.02.4-150500.5.6.1</li>
<li>slurm-sjstat-23.02.4-150500.5.6.1</li>
</ul>
</li>
<li>
HPC Module 15-SP5 (aarch64 x86_64)
<ul>
<li>perl-slurm-23.02.4-150500.5.6.1</li>
<li>slurm-cray-23.02.4-150500.5.6.1</li>
<li>libslurm39-23.02.4-150500.5.6.1</li>
<li>slurm-node-23.02.4-150500.5.6.1</li>
<li>slurm-plugins-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-rest-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-sql-23.02.4-150500.5.6.1</li>
<li>slurm-auth-none-23.02.4-150500.5.6.1</li>
<li>slurm-pam_slurm-23.02.4-150500.5.6.1</li>
<li>slurm-devel-23.02.4-150500.5.6.1</li>
<li>slurm-23.02.4-150500.5.6.1</li>
<li>slurm-slurmdbd-debuginfo-23.02.4-150500.5.6.1</li>
<li>libslurm39-debuginfo-23.02.4-150500.5.6.1</li>
<li>libnss_slurm2-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-slurmdbd-23.02.4-150500.5.6.1</li>
<li>slurm-munge-23.02.4-150500.5.6.1</li>
<li>slurm-sview-23.02.4-150500.5.6.1</li>
<li>slurm-torque-23.02.4-150500.5.6.1</li>
<li>slurm-sql-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-cray-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-torque-debuginfo-23.02.4-150500.5.6.1</li>
<li>libpmi0-23.02.4-150500.5.6.1</li>
<li>slurm-auth-none-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-debugsource-23.02.4-150500.5.6.1</li>
<li>libpmi0-debuginfo-23.02.4-150500.5.6.1</li>
<li>libnss_slurm2-23.02.4-150500.5.6.1</li>
<li>slurm-pam_slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-node-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-lua-debuginfo-23.02.4-150500.5.6.1</li>
<li>perl-slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-sview-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-plugins-23.02.4-150500.5.6.1</li>
<li>slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-munge-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-plugin-ext-sensors-rrd-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-rest-23.02.4-150500.5.6.1</li>
<li>slurm-plugin-ext-sensors-rrd-23.02.4-150500.5.6.1</li>
<li>slurm-lua-23.02.4-150500.5.6.1</li>
</ul>
</li>
<li>
HPC Module 15-SP5 (noarch)
<ul>
<li>slurm-webdoc-23.02.4-150500.5.6.1</li>
<li>slurm-doc-23.02.4-150500.5.6.1</li>
<li>slurm-config-23.02.4-150500.5.6.1</li>
<li>slurm-config-man-23.02.4-150500.5.6.1</li>
</ul>
</li>
<li>
SUSE Package Hub 15 15-SP5 (ppc64le s390x)
<ul>
<li>perl-slurm-23.02.4-150500.5.6.1</li>
<li>slurm-cray-23.02.4-150500.5.6.1</li>
<li>slurm-plugins-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-node-23.02.4-150500.5.6.1</li>
<li>slurm-rest-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-sql-23.02.4-150500.5.6.1</li>
<li>slurm-auth-none-23.02.4-150500.5.6.1</li>
<li>slurm-pam_slurm-23.02.4-150500.5.6.1</li>
<li>slurm-devel-23.02.4-150500.5.6.1</li>
<li>slurm-23.02.4-150500.5.6.1</li>
<li>slurm-slurmdbd-debuginfo-23.02.4-150500.5.6.1</li>
<li>libnss_slurm2-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-slurmdbd-23.02.4-150500.5.6.1</li>
<li>slurm-munge-23.02.4-150500.5.6.1</li>
<li>slurm-sview-23.02.4-150500.5.6.1</li>
<li>slurm-torque-23.02.4-150500.5.6.1</li>
<li>slurm-sql-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-cray-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-torque-debuginfo-23.02.4-150500.5.6.1</li>
<li>libpmi0-23.02.4-150500.5.6.1</li>
<li>slurm-auth-none-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-debugsource-23.02.4-150500.5.6.1</li>
<li>libpmi0-debuginfo-23.02.4-150500.5.6.1</li>
<li>libnss_slurm2-23.02.4-150500.5.6.1</li>
<li>slurm-pam_slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-hdf5-23.02.4-150500.5.6.1</li>
<li>slurm-node-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-lua-debuginfo-23.02.4-150500.5.6.1</li>
<li>perl-slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-plugins-23.02.4-150500.5.6.1</li>
<li>slurm-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-munge-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-hdf5-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-rest-23.02.4-150500.5.6.1</li>
<li>slurm-sview-debuginfo-23.02.4-150500.5.6.1</li>
<li>slurm-lua-23.02.4-150500.5.6.1</li>
</ul>
</li>
<li>
SUSE Package Hub 15 15-SP5 (noarch)
<ul>
<li>slurm-doc-23.02.4-150500.5.6.1</li>
<li>slurm-openlava-23.02.4-150500.5.6.1</li>
<li>slurm-config-23.02.4-150500.5.6.1</li>
<li>slurm-config-man-23.02.4-150500.5.6.1</li>
<li>slurm-webdoc-23.02.4-150500.5.6.1</li>
<li>slurm-seff-23.02.4-150500.5.6.1</li>
<li>slurm-sjstat-23.02.4-150500.5.6.1</li>
</ul>
</li>
</ul>
<h2>References:</h2>
<ul>
<li>
<a href="https://bugzilla.suse.com/show_bug.cgi?id=1214983">https://bugzilla.suse.com/show_bug.cgi?id=1214983</a>
</li>
</ul>
</div>