SUSE-RU-2022:4348-1: important: Recommended update for pdsh, slurm_22_05
sle-updates at lists.suse.com
sle-updates at lists.suse.com
Wed Dec 7 17:28:36 UTC 2022
SUSE Recommended Update: Recommended update for pdsh, slurm_22_05
______________________________________________________________________________
Announcement ID: SUSE-RU-2022:4348-1
Rating: important
References: PED-2305
Affected Products:
SUSE Linux Enterprise Module for HPC 12
______________________________________________________________________________
An update that has 0 recommended fixes and contains one
feature can now be installed.
Description:
This update for pdsh, slurm_22_05 fixes the following issues:
Slurm was updated to 22.05.5
- Fixes a number of moderate severity issues, noteable are:
* Load hash plugin at slurmstepd launch time to prevent issues loading
the plugin at step completion if the Slurm installation is upgraded.
* Update nvml plugin to match the unique id format for MIG devices in
new Nvidia drivers.
* Fix multi-node step launch failure when nodes in the controller aren't
in natural order. This can happen with inconsistent node naming (such
as node15 and node052) or with dynamic nodes which can register in any
order.
* job_container/tmpfs - cleanup containers even when the .ns file isn't
mounted anymore.
* Wait up to PrologEpilogTimeout before shutting down slurmd to allow
prolog and epilog scripts to complete or timeout. Previously, slurmd
waited 120 seconds before timing out and killing prolog and epilog
scripts.
- Do not deduplicate files of testsuite Slurm configuration. This
directory is supposed to be mounted over /etc/slurm therefore it must
not contain softlinks to the files in this directory.
- Fix a potential security vulnerability in the test package (bsc#1201674,
CVE-2022-31251).
- update to 22.05.2 with following fixes:
* Fix regression which allowed the oversubscription of licenses.
* Fix a segfault in slurmctld when requesting gres in job arrays.
- Allow log in as user 'slurm'. This allows admins to run certain
priviledged commands more easily without becoming root.
update to 22.05.0 with following changes:
- Support for dynamic node addition and removal
- Support for native Linux cgroup v2 operation
- Newly added plugins to support HPE Slingshot 11 networks
(switch/hpe_slingshot), and Intel Xe GPUs (gpu/oneapi)
- Added new acct_gather_interconnect/sysfs plugin to collect statistics
from arbitrary network interfaces.
- Expanded and synced set of environment variables available in the
Prolog/Epilog/PrologSlurmctld/EpilogSlurmctld scripts.
- New "--prefer" option to job submissions to allow for a "soft
constraint" request to influence node selection.
- Optional support for license planning in the backfill scheduler with
"bf_licenses" option in SchedulerParameters.
- Add a comment about the CommunicationParameters=block_null_hash
option warning users who migrate - just in case.
- Update to 21.08.8 which fixes CVE-2022-29500 (bsc#1199278),
CVE-2022-29501 (bsc#1199279), and CVE-2022-29502 (bsc#1199281).
- Added 'CommunicationParameters=block_null_hash' to slurm.conf, please
add this parameter to existing configurations.
- Update to 21.08.7 with following changes:
* openapi/v0.0.37 - correct calculation for bf_queue_len_mean in /diag.
* Avoid shrinking a reservation when overlapping with downed nodes.
* Only check TRES limits against current usage for TRES requested by the
job.
* Do not allocate shared gres (MPS) in whole-node allocations
* Constrain slurmstepd to job/step cgroup like in previous versions of
Slurm.
* Fix warnings on 32-bit compilers related to printf() formats.
* Fix reconfigure issues after disabling/reenabling the GANG PreemptMode.
* Fix race condition where a cgroup was being deleted while another step
was creating it.
* Set the slurmd port correctly if multi-slurmd
* Fix FAIL mail not being sent if a job was cancelled due to preemption.
* slurmrestd - move debug logs for HTTP handling to be gated by
debugflag NETWORK to avoid unnecessary logging of communication
contents.
* Fix issue with bad memory access when shrinking running steps.
* Fix various issues with internal job accounting with GRES when jobs
are shrunk.
* Fix ipmi polling on slurmd reconfig or restart.
* Fix srun crash when reserved ports are being used and het step fails
to launch.
* openapi/dbv0.0.37 - fix DELETE execution path on /user/{user_name}.
* slurmctld - Properly requeue all components of a het job if
PrologSlurmctld fails.
* rlimits - remove final calls to limit nofiles to 4096 but to instead
use the max possible nofiles in slurmd and slurmdbd.
* Allow the DBD agent to load large messages (up to MAX_BUF_SIZE) from
state.
* Fix potential deadlock during slurmctld restart when there is a
completing job.
* slurmstepd - reduce user requested soft rlimits when they are above
max hard rlimits to avoid rlimit request being completely ignored and
processes using default limits.
* Fix Slurm user commands displaying available features as active
features when no features were active.
* Don't power down nodes that are rebooting.
* Clear pending node reboot on power down request.
* Ignore node registrations while node is powering down.
* Don't reboot any node that is power<ing|ed> down.
* Don't allow a node to reboot if it's marked for power down.
* Fix issuing reboot and downing when rebooting a powering up node.
* Clear DRAIN on node after failing to resume before ResumeTimeout.
* Prevent repeating power down if node fails to resume before
ResumeTimeout.
* Fix federated cloud node communication with srun and cloud_dns.
* Fix jobs being scheduled on nodes marked to be powered_down when idle.
* Fix problem where a privileged user could not view array tasks
specified by <array_job_id>_<task_id> when PrivateData had the jobs
value set.
- Changes in Slurm 21.08.6
* Fix plugin_name definitions in a number of plugins to improve logging.
* Close sbcast file transfers when job is cancelled.
* scrontab - fix handling of --gpus and --ntasks-per-gpu options.
* sched/backfill - fix job_queue_rec_t memory leak.
* Fix magnetic reservation logic in both main and backfill schedulers.
* job_container/tmpfs - fix memory leak when using InitScript.
* slurmrestd / openapi - fix memory leaks.
* Fix slurmctld segfault due to job array resv_list double free.
* Fix multi-reservation job testing logic.
* Fix slurmctld segfault due to insufficient job reservation parse
validation.
* Fix main and backfill schedulers handling for already rejected job
array.
* sched/backfill - restore resv_ptr after yielding locks.
* acct_gather_energy/xcc - appropriately close and destroy the IPMI
context.
* Protect slurmstepd from making multiple calls to the cleanup logic.
* Prevent slurmstepd segfault at cleanup time in mpi_fini().
* Fix slurmctld sometimes hanging if shutdown while PrologSlurmctld or
EpilogSlurmctld were running and PrologEpilogTimeout is set in
slurm.conf.
* Fix affinity of the batch step if batch host is different than the
first node in the allocation.
* slurmdbd - fix segfault after multiple failover/failback operations.
* Fix jobcomp filetxt job selection condition.
* Fix -f flag of sacct not being used.
* Select cores for job steps according to the socket distribution.
Previously, sockets were always filled before selecting cores from the
next socket.
* Keep node in Future state if epilog completes while in Future state.
* Fix erroneous --constraint behavior by preventing multiple sets of
brackets.
* Make ResetAccrueTime update the job's accrue_time to now.
* Fix sattach initialization with configless mode.
* Revert packing limit checks affecting pmi2.
* sacct - fixed assertion failure when using -c option and a federation
display
* Fix issue that allowed steps to overallocate the job's memory.
* Fix the sanity check mode of AutoDetect so that it actually works.
* Fix deallocated nodes that didn't actually launch a job from waiting
for Epilogslurmctld to complete before clearing completing node's
state.
* Job should be in a completing state if EpilogSlurmctld when being
requeued.
* Fix job not being requeued properly if all node epilog's completed
before EpilogSlurmctld finished.
* Keep job completing until EpilogSlurmctld is completed even when
"downing" a node.
* Fix handling reboot with multiple job features.
* Fix nodes getting powered down when creating new partitions.
* Fix bad bit_realloc which potentially could lead to bad memory access.
* slurmctld - remove limit on the number of open files.
* Fix bug where job_state file of size above 2GB wasn't saved without
any error message.
* Fix various issues with no_consume gres.
* Fix regression in 21.08.0rc1 where job steps failed to launch on
systems that reserved a CPU in a cgroup outside of Slurm (for example,
on systems with WekaIO).
* Fix OverTimeLimit not being reset on scontrol reconfigure when it is
removed from slurm.conf.
* serializer/yaml - use dynamic buffer to allow creation of YAML outputs
larger than 1MiB.
* Fix minor memory leak affecting openapi users at process termination.
* Fix batch jobs not resolving the username when nss_slurm is enabled.
* slurmrestd - Avoid slurmrestd ignoring invalid HTTP method if the
response serialized without error.
* openapi/dbv0.0.37 - Correct conditional that caused the diag output to
give an internal server error status on success.
* Make --mem-bind=sort work with task_affinity
* Fix sacctmgr to set MaxJobsAccruePer{User|Account} and MinPrioThres in
sacctmgr add qos, modify already worked correctly.
* job_container/tmpfs - avoid printing extraneous error messages in
Prolog and Epilog, and when the job completes.
* Fix step CPU memory allocation with --threads-per-core without --exact.
* Remove implicit --exact when --threads-per-core or
--hint=nomultithread is used.
* Do not allow a step to request more threads per core than the
allocation did.
* Remove implicit --exact when --cpus-per-task is used.
- update to 21.08.5 with following changes:
* Fix issue where typeless GRES node updates were not immediately
reflected.
* Fix setting the default scrontab job working directory so that it's
the home
of the different user (*u <user>) and not that of root or SlurmUser
editor.
* Fix stepd not respecting SlurmdSyslogDebug.
* Fix concurrency issue with squeue.
* Fix job start time not being reset after launch when job is packed
onto already booting node.
* Fix updating SLURM_NODE_ALIASES for jobs packed onto powering up nodes.
* Cray - Fix issues with starting hetjobs.
* auth/jwks - Print fatal() message when jwks is configured but file
could not be opened.
* If sacctmgr has an association with an unknown qos as the default qos
print 'UNKN*###' instead of leaving a blank name.
* Correctly determine task count when giving --cpus-per-gpu, --gpus and
*-ntasks-per-node without task count.
* slurmctld - Fix places where the global last_job_update was not being
set to the time of update when a job's reason and description were
updated.
* slurmctld - Fix case where a job submitted with more than one
partition would not have its reason updated while waiting to start.
* Fix memory leak in node feature rebooting.
* Fix time limit permanetly set to 1 minute by backfill for job array
tasks higher than the first with QOS NoReserve flag and PreemptMode
configured.
* Fix sacct -N to show jobs that started in the current second
* Fix issue on running steps where both SLURM_NTASKS_PER_TRES and
SLURM_NTASKS_PER_GPU are set.
* Handle oversubscription request correctly when also requesting
*-ntasks-per-tres.
* Correctly detect when a step requests bad gres inside an allocation.
* slurmstepd - Correct possible deadlock when UnkillableStepTimeout
triggers.
* srun - use maximum number of open files while handling job I/O.
* Fix writing to Xauthority files on root_squash NFS exports, which was
preventing X11 forwarding from completing setup.
* Fix regression in 21.08.0rc1 that broke --gres=none.
* Fix srun --cpus-per-task and --threads-per-core not implicitly setting
*-exact. It was meant to work this way in 21.08.
* Fix regression in 21.08.0 that broke dynamic future nodes.
* Fix dynamic future nodes remembering active state on restart.
* Fix powered down nodes getting stuck in COMPLETING+POWERED_DOWN when
job is cancelled before nodes are powering up.
updated to 21.08.4 which fixes (CVE-2021-43337) which is only present in
21.08 tree.
* CVE-2021-43337: For sites using the new
AccountingStoreFlags=job_script and/or job_env
options, an issue was reported with the access control rules in
SlurmDBD that will permit users to request job scripts and
environment files that they should not have access to.
(Scripts/environments are meant to only be accessible by user
accounts with administrator privileges, by account coordinators for
jobs submitted under their account, and by the user themselves.)
changes from 21.08.3:
* This includes a number of fixes since the last release a month ago,
including one critical fix to prevent a communication issue between
slurmctld and slurmdbd for sites that have started using the new
AccountingStoreFlags=job_script functionality.
- Utilize sysuser infrastructure to set user/group slurm. For munge
authentication slurm should have a fixed UID across all nodes including
the management server. Set it to 120
- Limit firewalld service definitions to SUSE versions >= 15.
- added service definitions for firewalld (JSC#SLE-22741)
update to 21.08.2
- major change:
* removed of support of the TaskAffinity=yes option in cgroup.conf.
Please consider using "TaskPlugins=cgroup,affinity" in slurm.conf as
an option.
- minor changes and bugfixes:
* slurmctld - fix how the max number of cores on a node in a partition
are calculated when the partition contains multi*socket nodes. This in
turn corrects certain jobs node count estimations displayed
client*side.
* job_submit/cray_aries - fix "craynetwork" GRES specification after
changes introduced in 21.08.0rc1 that made TRES always have a type
prefix.
* Ignore nonsensical check in the slurmd for [Pro|Epi]logSlurmctld.
* Fix writing to stderr/syslog when systemd runs slurmctld in the
foreground.
* Fix issue with updating job started with node range.
* Fix issue with nodes not clearing state in the database when the
slurmctld is started with clean*start.
* Fix hetjob components > 1 timing out due to InactiveLimit.
* Fix sprio printing -nan for normalized association priority if
PriorityWeightAssoc was not defined.
* Disallow FirstJobId=0.
* Preserve job start info in the database for a requeued job that hadn't
registered the first time in the database yet.
* Only send one message on prolog failure from the slurmd.
* Remove support for TaskAffinity=yes in cgroup.conf.
* accounting_storage/mysql - fix issue where querying jobs via sacct
*-whole-hetjob=yes or slurmrestd (which automatically includes this
flag) could in some cases return more records than expected.
* Fix issue for preemption of job array task that makes afterok
dependency fail. Additionally, send emails when requeueing happens due
to preemption.
* Fix sending requeue mail type.
* Properly resize a job's GRES bitmaps and counts when resizing the job.
* Fix node being able to transition to CLOUD state from non-cloud state.
* Fix regression introduced in 21.08.0rc1 which broke a step's ability
to inherit GRES from the job when the step didn't request GRES but the
job did.
* Fix errors in logic when picking nodes based on bracketed anded
constraints. This also enforces the requirement to have a count when
using such constraints.
* Handle job resize better in the database.
* Exclude currently running, resized jobs from the runaway jobs list.
* Make it possible to shrink a job more than once.
- moved pam module from /lib64 to /usr/lib64 which fixes bsc#1191095 via
the macro %_pam_moduledir
updated to 21.08.1 with following bug fixes:
* Fix potential memory leak if a problem happens while allocating GRES
for a job.
* If an overallocation of GRES happens terminate the creation of a job.
* AutoDetect=nvml: Fatal if no devices found in MIG mode.
* Print federation and cluster sacctmgr error messages to stderr.
* Fix off by one error in --gpu-bind=mask_gpu.
* Add --gpu-bind=none to disable gpu binding when using --gpus-per-task.
* Handle the burst buffer state "alloc-revoke" which previously would
not display in the job correctly.
* Fix issue in the slurmstepd SPANK prolog/epilog handler where
configuration values were used before being initialized.
* Restore a step's ability to utilize all of an allocations memory if
--mem=0.
* Fix --cpu-bind=verbose garbage taskid.
* Fix cgroup task affinity issues from garbage taskid info.
* Make gres_job_state_validate() client logging behavior as before
44466a4641.
* Fix steps with --hint overriding an allocation with --threads-per-core.
* Require requesting a GPU if --mem-per-gpu is requested.
* Return error early if a job is requesting --ntasks-per-gpu and no gpus
or task count.
* Properly clear out pending step if unavailable to run with available
resources.
* Kill all processes spawned by burst_buffer.lua including decendents.
* openapi/v0.0.{35,36,37} - Avoid setting default values of min_cpus,
job name, cwd, mail_type, and contiguous on job update.
* openapi/v0.0.{35,36,37} - Clear user hold on job update if hold=false.
* Prevent CRON_JOB flag from being cleared when loading job state.
* sacctmgr - Fix deleting WCKeys when not specifying a cluster.
* Fix getting memory for a step when the first node in the step isn't
the first node in the allocation.
* Make SelectTypeParameters=CR_Core_Memory default for cons_tres and
cons_res.
* Correctly handle mutex unlocks in the gres code if failures happen.
* Give better error message if -m plane is given with no size.
* Fix --distribution=arbitrary for salloc.
* Fix jobcomp/script regression introduced in 21.08.0rc1 0c75b9ac9d.
* Only send the batch node in the step_hostlist in the job credential.
* When setting affinity for the batch step don't assume the batch host
is node 0.
* In task/affinity better checking for node existence when laying out
affinity.
* slurmrestd - fix job submission with auth/jwt.
- Make configure arg '--with-pmix' conditional.
- Move openapi plugins to package slurm-restd.
updated to 21.08.0, major changes:
* A new "AccountingStoreFlags=job_script" option to store the job
scripts directly in SlurmDBD.
* Added "sacct -o SubmitLine" format option to get the submit line
of a job/step.
* Changes to the node state management so that nodes are marked as
PLANNED instead of IDLE if the scheduler is still accumulating
resources while waiting to launch a job on them.
* RS256 token support in auth/jwt.
* Overhaul of the cgroup subsystems to simplify operation, mitigate a
number
of inherent race conditions, and prepare for future cgroup v2 support.
* Further improvements to cloud node power state management.
* A new child process of the Slurm controller called "slurmscriptd"
responsible for executing PrologSlurmctld and EpilogSlurmctld scripts,
which significantly reduces performance issues associated with
enabling those options.
* A new burst_buffer/lua plugin allowing for site-specific asynchronous
job data management.
* Fixes to the job_container/tmpfs plugin to allow the slurmd process to
be restarted while the job is running without issue.
* Added json/yaml output to sacct, squeue, and sinfo commands.
* Added a new node_features/helpers plugin to provide a generic way to
change settings on a compute node across a reboot.
* Added support for automatically detecting and broadcasting shared
libraries for an executable launched with "srun --bcast".
* Added initial OCI container execution support with a new --container
option to sbatch and srun.
* Improved "configless" support by allowing multiple control servers to
be specified through the slurmd --conf-server option, and send
additional configuration files at startup including cli_filter.lua.
Changes in pdsh:
- Preparing pdsh for Slurm 22.05.
* No later version of Slurm builds on 32 bit.
Patch Instructions:
To install this SUSE Recommended Update use the SUSE recommended installation methods
like YaST online_update or "zypper patch".
Alternatively you can run the command listed for your product:
- SUSE Linux Enterprise Module for HPC 12:
zypper in -t patch SUSE-SLE-Module-HPC-12-2022-4348=1
Package List:
- SUSE Linux Enterprise Module for HPC 12 (aarch64 x86_64):
libnss_slurm2_22_05-22.05.5-3.3.5
libnss_slurm2_22_05-debuginfo-22.05.5-3.3.5
libpmi0_22_05-22.05.5-3.3.5
libpmi0_22_05-debuginfo-22.05.5-3.3.5
libslurm38-22.05.5-3.3.5
libslurm38-debuginfo-22.05.5-3.3.5
pdsh-2.34-7.35.2
pdsh-debuginfo-2.34-7.35.2
pdsh-debugsource-2.34-7.35.2
pdsh-dshgroup-2.34-7.35.2
pdsh-dshgroup-debuginfo-2.34-7.35.2
pdsh-genders-2.34-7.35.2
pdsh-genders-debuginfo-2.34-7.35.2
pdsh-machines-2.34-7.35.2
pdsh-machines-debuginfo-2.34-7.35.2
pdsh-netgroup-2.34-7.35.2
pdsh-netgroup-debuginfo-2.34-7.35.2
pdsh-slurm-2.34-7.35.2
pdsh-slurm-debuginfo-2.34-7.35.2
pdsh-slurm_18_08-2.34-7.35.3
pdsh-slurm_18_08-debuginfo-2.34-7.35.3
pdsh-slurm_20_02-2.34-7.35.3
pdsh-slurm_20_02-debuginfo-2.34-7.35.3
pdsh-slurm_20_11-2.34-7.35.3
pdsh-slurm_20_11-debuginfo-2.34-7.35.3
pdsh-slurm_22_05-2.34-7.35.5
pdsh-slurm_22_05-debuginfo-2.34-7.35.5
pdsh_slurm_18_08-debugsource-2.34-7.35.3
pdsh_slurm_20_02-debugsource-2.34-7.35.3
pdsh_slurm_20_11-debugsource-2.34-7.35.3
pdsh_slurm_22_05-debugsource-2.34-7.35.5
perl-slurm_22_05-22.05.5-3.3.5
perl-slurm_22_05-debuginfo-22.05.5-3.3.5
slurm_22_05-22.05.5-3.3.5
slurm_22_05-auth-none-22.05.5-3.3.5
slurm_22_05-auth-none-debuginfo-22.05.5-3.3.5
slurm_22_05-debuginfo-22.05.5-3.3.5
slurm_22_05-debugsource-22.05.5-3.3.5
slurm_22_05-devel-22.05.5-3.3.5
slurm_22_05-lua-22.05.5-3.3.5
slurm_22_05-lua-debuginfo-22.05.5-3.3.5
slurm_22_05-munge-22.05.5-3.3.5
slurm_22_05-munge-debuginfo-22.05.5-3.3.5
slurm_22_05-node-22.05.5-3.3.5
slurm_22_05-node-debuginfo-22.05.5-3.3.5
slurm_22_05-pam_slurm-22.05.5-3.3.5
slurm_22_05-pam_slurm-debuginfo-22.05.5-3.3.5
slurm_22_05-plugins-22.05.5-3.3.5
slurm_22_05-plugins-debuginfo-22.05.5-3.3.5
slurm_22_05-slurmdbd-22.05.5-3.3.5
slurm_22_05-slurmdbd-debuginfo-22.05.5-3.3.5
slurm_22_05-sql-22.05.5-3.3.5
slurm_22_05-sql-debuginfo-22.05.5-3.3.5
slurm_22_05-sview-22.05.5-3.3.5
slurm_22_05-sview-debuginfo-22.05.5-3.3.5
slurm_22_05-torque-22.05.5-3.3.5
slurm_22_05-torque-debuginfo-22.05.5-3.3.5
- SUSE Linux Enterprise Module for HPC 12 (noarch):
slurm_22_05-config-22.05.5-3.3.5
slurm_22_05-config-man-22.05.5-3.3.5
slurm_22_05-doc-22.05.5-3.3.5
slurm_22_05-webdoc-22.05.5-3.3.5
References:
More information about the sle-updates
mailing list