SUSE-SU-2025:01757-1: important: Security update for slurm_24_11
SLE-SECURITY-UPDATES
null at suse.de
Thu May 29 16:30:16 UTC 2025
# Security update for slurm_24_11
Announcement ID: SUSE-SU-2025:01757-1
Release Date: 2025-05-29T14:47:58Z
Rating: important
References:
* bsc#1243666
Cross-References:
* CVE-2025-43904
CVSS scores:
* CVE-2025-43904 ( SUSE ): 8.5
CVSS:4.0/AV:L/AC:L/AT:N/PR:L/UI:N/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N
* CVE-2025-43904 ( SUSE ): 7.8 CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H
Affected Products:
* HPC Module 12
* SUSE Linux Enterprise High Performance Computing 12 SP2
* SUSE Linux Enterprise High Performance Computing 12 SP3
* SUSE Linux Enterprise High Performance Computing 12 SP4
* SUSE Linux Enterprise High Performance Computing 12 SP5
* SUSE Linux Enterprise Server 12 SP2
* SUSE Linux Enterprise Server 12 SP3
* SUSE Linux Enterprise Server 12 SP4
* SUSE Linux Enterprise Server 12 SP5
* SUSE Linux Enterprise Server for SAP Applications 12 SP2
* SUSE Linux Enterprise Server for SAP Applications 12 SP3
* SUSE Linux Enterprise Server for SAP Applications 12 SP4
* SUSE Linux Enterprise Server for SAP Applications 12 SP5
An update that solves one vulnerability can now be installed.
## Description:
This update for slurm_24_11 fixes the following issues:
Update to version 24.11.5.
Security issues fixed:
* CVE-2025-43904: an issue with permission handling for Coordinators within
the accounting system allowed Coordinators to promote a user to
Administrator (bsc#1243666).
Other changes and issues fixed:
* Changes from version 24.11.5
* Return error to `scontrol` reboot on bad nodelists.
* `slurmrestd` \- Report an error when QOS resolution fails for v0.0.40
endpoints.
* `slurmrestd` \- Report an error when QOS resolution fails for v0.0.41
endpoints.
* `slurmrestd` \- Report an error when QOS resolution fails for v0.0.42
endpoints.
* `data_parser/v0.0.42` \- Added `+inline_enums` flag which modifies the
output when generating OpenAPI specification. It causes enum arrays to not
be defined in their own schema with references (`$ref`) to them. Instead
they will be dumped inline.
* Fix binding error with `tres-bind map/mask` on partial node allocations.
* Fix `stepmgr` enabled steps being able to request features.
* Reject step creation if requested feature is not available in job.
* `slurmd` \- Restrict listening for new incoming RPC requests further into
startup.
* `slurmd` \- Avoid `auth/slurm` related hangs of CLI commands during startup
and shutdown.
* `slurmctld` \- Restrict processing new incoming RPC requests further into
startup. Stop processing requests sooner during shutdown.
* `slurmcltd` \- Avoid auth/slurm related hangs of CLI commands during startup
and shutdown.
* `slurmctld` \- Avoid race condition during shutdown or ereconfigure that
could result in a crash due delayed processing of a connection while plugins
are unloaded.
* Fix small memleak when getting the job list from the database.
* Fix incorrect printing of `%` escape characters when printing stdio fields
for jobs.
* Fix padding parsing when printing stdio fields for jobs.
* Fix printing `%A` array job id when expanding patterns.
* Fix reservations causing jobs to be held for `Bad Constraints`.
* `switch/hpe_slingshot` \- Prevent potential segfault on failed curl request
to the fabric manager.
* Fix printing incorrect array job id when expanding stdio file names. The
`%A` will now be substituted by the correct value.
* Fix printing incorrect array job id when expanding stdio file names. The
`%A` will now be substituted by the correct value.
* `switch/hpe_slingshot` \- Fix VNI range not updating on slurmctld restart or
reconfigre.
* Fix steps not being created when using certain combinations of `-c` and `-n`
inferior to the jobs requested resources, when using stepmgr and nodes are
configured with `CPUs == Sockets*CoresPerSocket`.
* Permit configuring the number of retry attempts to destroy CXI service via
the new destroy_retries `SwitchParameter`.
* Do not reset `memory.high` and `memory.swap.max` in slurmd startup or
reconfigure as we are never really touching this in `slurmd`.
* Fix reconfigure failure of slurmd when it has been started manually and the
`CoreSpecLimits` have been removed from `slurm.conf`.
* Set or reset CoreSpec limits when slurmd is reconfigured and it was started
with systemd.
* `switch/hpe-slingshot` \- Make sure the slurmctld can free step VNIs after
the controller restarts or reconfigures while the job is running.
* Fix backup `slurmctld` failure on 2nd takeover.
* Changes from version 24.11.4
* `slurmctld`,`slurmrestd` \- Avoid possible race condition that could have
caused process to crash when listener socket was closed while accepting a
new connection.
* `slurmrestd` \- Avoid race condition that could have resulted in address
logged for a UNIX socket to be incorrect.
* `slurmrestd` \- Fix parameters in OpenAPI specification for the following
endpoints to have `job_id` field: `GET /slurm/v0.0.40/jobs/state/ GET
/slurm/v0.0.41/jobs/state/ GET /slurm/v0.0.42/jobs/state/ GET
/slurm/v0.0.43/jobs/state/`
* `slurmd` \- Fix tracking of thread counts that could cause incoming
connections to be ignored after burst of simultaneous incoming connections
that trigger delayed response logic.
* Avoid unnecessary `SRUN_TIMEOUT` forwarding to `stepmgr`.
* Fix jobs being scheduled on higher weighted powered down nodes.
* Fix how backfill scheduler filters nodes from the available nodes based on
exclusive user and `mcs_label` requirements.
* `acct_gather_energy/{gpu,ipmi}` \- Fix potential energy consumption
adjustment calculation underflow.
* `acct_gather_energy/ipmi` \- Fix regression introduced in 24.05.5 (which
introduced the new way of preserving energy measurements through slurmd
restarts) when `EnergyIPMICalcAdjustment=yes`.
* Prevent `slurmctld` deadlock in the assoc mgr.
* Fix memory leak when `RestrictedCoresPerGPU` is enabled.
* Fix preemptor jobs not entering execution due to wrong calculation of
accounting policy limits.
* Fix certain job requests that were incorrectly denied with node
configuration unavailable error.
* `slurmd` \- Avoid crash due when slurmd has a communications failure with
`slurmstepd`.
* Fix memory leak when parsing yaml input.
* Prevent `slurmctld` from showing error message about `PreemptMode=GANG`
being a cluster-wide option for `scontrol update part` calls that don't
attempt to modify partition PreemptMode.
* Fix setting `GANG` preemption on partition when updating `PreemptMode` with
`scontrol`.
* Fix `CoreSpec` and `MemSpec` limits not being removed from previously
configured slurmd.
* Avoid race condition that could lead to a deadlock when `slurmd`,
`slurmstepd`, `slurmctld`, `slurmrestd` or `sackd` have a fatal event.
* Fix jobs using `--ntasks-per-node` and `--mem` keep pending forever when the
requested mem divided by the number of CPUs will surpass the configured
`MaxMemPerCPU`.
* `slurmd` \- Fix address logged upon new incoming RPC connection from
`INVALID` to IP address.
* Fix memory leak when retrieving reservations. This affects `scontrol`,
`sinfo`, `sview`, and the following `slurmrestd` endpoints: `GET
/slurm/{any_data_parser}/reservation/{reservation_name}` `GET
/slurm/{any_data_parser}/reservations`
* Log warning instead of `debuflags=conmgr` gated log when deferring new
incoming connections when number of active connections exceed
`conmgr_max_connections`.
* Avoid race condition that could result in worker thread pool not activating
all threads at once after a reconfigure resulting in lower utilization of
available CPU threads until enough internal activity wakes up all threads in
the worker pool.
* Avoid theoretical race condition that could result in new incoming RPC
socket connections being ignored after reconfigure.
* slurmd - Avoid race condition that could result in a state where new
incoming RPC connections will always be ignored.
* Add ReconfigFlags=KeepNodeStateFuture to restore saved `FUTURE` node state
on restart and reconfig instead of reverting to `FUTURE` state. This will be
made the default in 25.05.
* Fix case where hetjob submit would cause `slurmctld` to crash.
* Fix jobs using `--cpus-per-gpu` and `--mem` keep pending forever when the
requested mem divided by the number of CPUs will surpass the configured
`MaxMemPerCPU`.
* Enforce that jobs using `--mem` and several `--*-per-*` options do not
violate the `MaxMemPerCPU` in place.
* `slurmctld` \- Fix use-cases of jobs incorrectly pending held when
`--prefer` features are not initially satisfied.
* `slurmctld` \- Fix jobs incorrectly held when `--prefer` not satisfied in
some use-cases.
* Ensure `RestrictedCoresPerGPU` and `CoreSpecCount` don't overlap.
* Changes from version 24.11.3
* Fix database cluster ID generation not being random.
* Fix a regression in which `slurmd -G` gave no output.
* Fix a long-standing crash in `slurmctld` after updating a reservation with
an empty nodelist. The crash could occur after restarting slurmctld, or if
downing/draining a node in the reservation with the `REPLACE` or
`REPLACE_DOWN` flag.
* Avoid changing process name to "`watch`" from original daemon name. This
could potentially breaking some monitoring scripts.
* Avoid `slurmctld` being killed by `SIGALRM` due to race condition at
startup.
* Fix race condition in slurmrestd that resulted in "`Requested data_parser
plugin does not support OpenAPI plugin`" error being returned for valid
endpoints.
* Fix race between `task/cgroup` CPUset and `jobacctgather/cgroup`. The first
was removing the pid from `task_X` cgroup directory causing memory limits to
not being applied.
* If multiple partitions are requested, set the `SLURM_JOB_PARTITION` output
environment variable to the partition in which the job is running for
`salloc` and `srun` in order to match the documentation and the behavior of
`sbatch`.
* `srun` \- Fixed wrongly constructed `SLURM_CPU_BIND` env variable that could
get propagated to downward srun calls in certain mpi environments, causing
launch failures.
* Don't print misleading errors for stepmgr enabled steps.
* `slurmrestd` \- Avoid connection to slurmdbd for the following endpoints:
`GET /slurm/v0.0.41/jobs GET /slurm/v0.0.41/job/{job_id}`
* `slurmrestd` \- Avoid connection to slurmdbd for the following endpoints:
`GET /slurm/v0.0.40/jobs GET /slurm/v0.0.40/job/{job_id}`
* `slurmrestd` \- Fix possible memory leak when parsing arrays with
`data_parser/v0.0.40`.
* `slurmrestd` \- Fix possible memory leak when parsing arrays with
`data_parser/v0.0.41`.
* `slurmrestd` \- Fix possible memory leak when parsing arrays with
`data_parser/v0.0.42`.
* Changes from version 24.11.2
* Fix segfault when submitting `--test-only` jobs that can preempt.
* Fix regression introduced in 23.11 that prevented the following flags from
being added to a reservation on an update: `DAILY`, `HOURLY`, `WEEKLY`,
`WEEKDAY`, and `WEEKEND`.
* Fix crash and issues evaluating job's suitability for running in nodes with
already suspended job(s) there.
* `slurmctld` will ensure that healthy nodes are not reported as
`UnavailableNodes` in job reason codes.
* Fix handling of jobs submitted to a current reservation with flags
`OVERLAP,FLEX` or `OVERLAP,ANY_NODES` when it overlaps nodes with a future
maintenance reservation. When a job submission had a time limit that
overlapped with the future maintenance reservation, it was rejected. Now the
job is accepted but stays pending with the reason "`ReqNodeNotAvail,
Reserved for maintenance`".
* `pam_slurm_adopt` \- avoid errors when explicitly setting some arguments to
the default value.
* Fix QOS preemption with `PreemptMode=SUSPEND`.
* `slurmdbd` \- When changing a user's name update lineage at the same time.
* Fix regression in 24.11 in which `burst_buffer.lua` does not inherit the
`SLURM_CONF` environment variable from `slurmctld` and fails to run if
slurm.conf is in a non-standard location.
* Fix memory leak in slurmctld if `select/linear` and the
`PreemptParameters=reclaim_licenses` options are both set in `slurm.conf`.
Regression in 24.11.1.
* Fix running jobs, that requested multiple partitions, from potentially being
set to the wrong partition on restart.
* `switch/hpe_slingshot` \- Fix compatibility with newer cxi drivers,
specifically when specifying `disable_rdzv_get`.
* Add `ABORT_ON_FATAL` environment variable to capture a backtrace from any
`fatal()` message.
* Fix printing invalid address in rate limiting log statement.
* `sched/backfill` \- Fix node state `PLANNED` not being cleared from fully
allocated nodes during a backfill cycle.
* `select/cons_tres` \- Fix future planning of jobs with `bf_licenses`.
* Prevent redundant "`on_data returned rc: Rate limit exceeded, please retry
momentarily`" error message from being printed in slurmctld logs.
* Fix loading non-default QOS on pending jobs from pre-24.11 state.
* Fix pending jobs displaying `QOS=(null)` when not explicitly requesting a
QOS.
* Fix segfault issue from job record with no `job_resrcs`.
* Fix failing `sacctmgr delete/modify/show` account operations with `where`
clauses.
* Fix regression in 24.11 in which Slurm daemons started catching several
`SIGTSTP`, `SIGTTIN` and `SIGUSR1` signals and ignored them, while before
they were not ignoring them. This also caused slurmctld to not being able to
shutdown after a `SIGTSTP` because slurmscriptd caught the signal and
stopped while slurmctld ignored it. Unify and fix these situations and get
back to the previous behavior for these signals.
* Document that `SIGQUIT` is no longer ignored by `slurmctld`, `slurmdbd`, and
slurmd in 24.11. As of 24.11.0rc1, `SIGQUIT` is identical to `SIGINT` and
`SIGTERM` for these daemons, but this change was not documented.
* Fix not considering nodes marked for reboot without ASAP in the scheduler.
* Remove the `boot^` state on unexpected node reboot after return to service.
* Do not allow new jobs to start on a node which is being rebooted with the
flag `nextstate=resume`.
* Prevent lower priority job running after cancelling an ASAP reboot.
* Fix srun jobs starting on `nextstate=resume` rebooting nodes.
## Patch Instructions:
To install this SUSE update use the SUSE recommended installation methods like
YaST online_update or "zypper patch".
Alternatively you can run the command listed for your product:
* HPC Module 12
zypper in -t patch SUSE-SLE-Module-HPC-12-2025-1757=1
## Package List:
* HPC Module 12 (aarch64 x86_64)
* slurm_24_11-24.11.5-3.8.1
* slurm_24_11-torque-debuginfo-24.11.5-3.8.1
* slurm_24_11-munge-debuginfo-24.11.5-3.8.1
* slurm_24_11-node-24.11.5-3.8.1
* slurm_24_11-auth-none-debuginfo-24.11.5-3.8.1
* slurm_24_11-node-debuginfo-24.11.5-3.8.1
* slurm_24_11-pam_slurm-24.11.5-3.8.1
* libnss_slurm2_24_11-debuginfo-24.11.5-3.8.1
* slurm_24_11-sview-24.11.5-3.8.1
* slurm_24_11-lua-debuginfo-24.11.5-3.8.1
* libnss_slurm2_24_11-24.11.5-3.8.1
* slurm_24_11-devel-24.11.5-3.8.1
* slurm_24_11-slurmdbd-debuginfo-24.11.5-3.8.1
* slurm_24_11-torque-24.11.5-3.8.1
* slurm_24_11-munge-24.11.5-3.8.1
* slurm_24_11-sview-debuginfo-24.11.5-3.8.1
* slurm_24_11-plugins-debuginfo-24.11.5-3.8.1
* slurm_24_11-sql-24.11.5-3.8.1
* libpmi0_24_11-24.11.5-3.8.1
* slurm_24_11-slurmdbd-24.11.5-3.8.1
* perl-slurm_24_11-24.11.5-3.8.1
* slurm_24_11-debuginfo-24.11.5-3.8.1
* libpmi0_24_11-debuginfo-24.11.5-3.8.1
* slurm_24_11-auth-none-24.11.5-3.8.1
* slurm_24_11-cray-24.11.5-3.8.1
* libslurm42-debuginfo-24.11.5-3.8.1
* slurm_24_11-plugins-24.11.5-3.8.1
* slurm_24_11-sql-debuginfo-24.11.5-3.8.1
* slurm_24_11-lua-24.11.5-3.8.1
* slurm_24_11-pam_slurm-debuginfo-24.11.5-3.8.1
* perl-slurm_24_11-debuginfo-24.11.5-3.8.1
* libslurm42-24.11.5-3.8.1
* HPC Module 12 (noarch)
* slurm_24_11-doc-24.11.5-3.8.1
* slurm_24_11-webdoc-24.11.5-3.8.1
* slurm_24_11-config-man-24.11.5-3.8.1
* slurm_24_11-config-24.11.5-3.8.1
## References:
* https://www.suse.com/security/cve/CVE-2025-43904.html
* https://bugzilla.suse.com/show_bug.cgi?id=1243666
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.suse.com/pipermail/sle-security-updates/attachments/20250529/6c9d9b08/attachment.htm>
More information about the sle-security-updates
mailing list