SUSE-RU-2018:2706-1: moderate: Recommended update for slurm

sle-updates at lists.suse.com sle-updates at lists.suse.com
Thu Sep 13 13:07:56 MDT 2018


   SUSE Recommended Update: Recommended update for slurm
______________________________________________________________________________

Announcement ID:    SUSE-RU-2018:2706-1
Rating:             moderate
References:         #1084917 #1103561 
Affected Products:
                    SUSE Linux Enterprise Module for HPC 15
______________________________________________________________________________

   An update that has two recommended fixes can now be
   installed.

Description:

   This update for slurm provides version 17.11.9 and fixes the following
   issues:

   - When using a remote shared StateSaveLocation, slurmctld needs to be
     started after remote filesystems have become available. (bsc#1103561)
   - Fix race in the slurmctld backup controller which prevents it to clean
     up allocations on nodes properly after failing over. (bsc#1084917)
   - Fix segfault in slurmctld when a job's node bitmap is NULL during a
     scheduling cycle.
   - Remove erroneous unlock in acct_gather_energy/ipmi.
   - Enable support for hwloc version 2.0.1.
   - Fix 'srun -q' (--qos) option handling.
   - Fix socket communication issue that can lead to lost task completion
     messages, which will cause a permanently stuck srun process.
   - Avoid node layout fragmentation if running with a fixed CPU count but
     without Sockets and CoresPerSocket defined.
   - burst_buffer/cray: Fix datawarp swap default pool overriding jobdw.
   - Fix incorrect job priority assignment for multi-partition job with
     different PriorityTier settings on the partitions.
   - Fix sinfo to print correct node state.
   - Do not allocate nodes that were marked down due to the node not
     responding by ResumeTimeout.
   - task/cray plugin: Search for "mems" cgroup information in the file
     "cpuset.mems" then fall back to the file "mems".
   - Fix ipmi profile debug uninitialized variable.
   - PMIx: Fixed the direct connect inline msg sending.
   - MYSQL: Fix issue not handling all fields when loading an archive dump.
   - Allow a job_submit plugin to change the admin_comment field during
     job_submit_plugin_modify().
   - job_submit/lua: Fix access into reservation table.
   - MySQL: Prevent deadlock caused by archive logic locking reads.
   - Don't enforce MaxQueryTimeRange when requesting specific jobs.
   - Modify --test-only logic to properly support jobs submitted to more than
     one partition.
   - Prevent slurmctld from abort when attempting to set non-existing qos as
     def_qos_id.
   - Add new job dependency type of "afterburstbuffer". The pending job will
     be delayed until the first job completes execution and it's burst buffer
     stage-out is completed.
   - Reorder proctrack/task plugin load in the slurmstepd to match that of
     slurmd and avoid race condition calling task before proctrack can
     introduce.
   - Prevent reboot of a busy KNL node when requesting inactive features.
   - Fix to reinitialize previously adjusted job members to their original
     value when validating the job memory in multi-partition requests.
   - Fix _step_signal() from always returning SLURM_SUCCESS.
   - Combine active and available node feature change logs on one line rather
     than one line per node for performance reasons.
   - Prevent occasionally leaking freezer cgroups.
   - Fix potential segfault when closing the mpi/pmi2 plugin.
   - Fix issues with  --exclusive=[user|mcs] to work correctly with
     preemption or when job requests a specific list of hosts.
   - mpi/pmix: Fixed the collectives canceling.
   - SlurmDBD: Improve error message handling on archive load failure.
   - Fix incorrect locking when deleting reservations.
   - Fix incorrect locking when setting up the power save module.
   - Fix setting format output length for squeue when showing array jobs.
   - Add xstrstr function.
   - Fix printing out of --hint options in sbatch, salloc --help.
   - Prevent possible divide by zero in _validate_time_limit().
   - Add Delegate=yes to the slurmd.service file to prevent systemd from
     interfering with the jobs' cgroup hierarchies.
   - Change the backlog argument to the listen() syscall within srun to 4096
     to match elsewhere in the code, and avoid communication problems at
     scale.
   - Recommend slurm-munge for slurm-slurmdbd.


Patch Instructions:

   To install this SUSE Recommended Update use the SUSE recommended installation methods
   like YaST online_update or "zypper patch".

   Alternatively you can run the command listed for your product:

   - SUSE Linux Enterprise Module for HPC 15:

      zypper in -t patch SUSE-SLE-Module-HPC-15-2018-1898=1



Package List:

   - SUSE Linux Enterprise Module for HPC 15 (aarch64 x86_64):

      libpmi0-17.11.9-6.9.1
      libpmi0-debuginfo-17.11.9-6.9.1
      libslurm32-17.11.9-6.9.1
      libslurm32-debuginfo-17.11.9-6.9.1
      perl-slurm-17.11.9-6.9.1
      perl-slurm-debuginfo-17.11.9-6.9.1
      slurm-17.11.9-6.9.1
      slurm-auth-none-17.11.9-6.9.1
      slurm-auth-none-debuginfo-17.11.9-6.9.1
      slurm-config-17.11.9-6.9.1
      slurm-debuginfo-17.11.9-6.9.1
      slurm-debugsource-17.11.9-6.9.1
      slurm-devel-17.11.9-6.9.1
      slurm-doc-17.11.9-6.9.1
      slurm-lua-17.11.9-6.9.1
      slurm-lua-debuginfo-17.11.9-6.9.1
      slurm-munge-17.11.9-6.9.1
      slurm-munge-debuginfo-17.11.9-6.9.1
      slurm-node-17.11.9-6.9.1
      slurm-node-debuginfo-17.11.9-6.9.1
      slurm-pam_slurm-17.11.9-6.9.1
      slurm-pam_slurm-debuginfo-17.11.9-6.9.1
      slurm-plugins-17.11.9-6.9.1
      slurm-plugins-debuginfo-17.11.9-6.9.1
      slurm-slurmdbd-17.11.9-6.9.1
      slurm-slurmdbd-debuginfo-17.11.9-6.9.1
      slurm-sql-17.11.9-6.9.1
      slurm-sql-debuginfo-17.11.9-6.9.1
      slurm-torque-17.11.9-6.9.1
      slurm-torque-debuginfo-17.11.9-6.9.1


References:

   https://bugzilla.suse.com/1084917
   https://bugzilla.suse.com/1103561



More information about the sle-updates mailing list