[Deepsea-users] Adding OSDs using Salt

Thu Mar 16 09:13:39 MDT 2017

What I did at first was to simply add the new profile (..2Disk50GB...)
in profile.cfg :

profile-1Disk50GB-1/cluster/ses4-[1234]*.sls
profile-1Disk50GB-1/stack/default/ceph/minions/ses4-[1234]*.yml
profile-2Disk50GB-1/cluster/ses4-[1234]*.sls
profile-2Disk50GB-1/stack/default/ceph/minions/ses4-[1234]*.yml

Then I ran stage.2 :

Succeeded: 12 (changed=4)
Failed:     0

salt '*' pillar.items shows :

    storage:
        ----------
        data+journals:
        osds:
            - /dev/vdb
            - /dev/vdb
            - /dev/vdc

So, vdb was added again.

stage.3 throws a lot of failure messages since it cannot add vdb again:

cephadm at salt:~> sudo salt-run state.orch ceph.stage.3
firewall                 : disabled
fsid                     : valid
public_network           : valid
public_interface         : valid
cluster_network          : valid
cluster_interface        : valid
monitors                 : valid
storage                  : valid
master_role              : valid
mon_host                 : valid
mon_initial_members      : valid
time_server              : disabled
fqdn                     : valid
[WARNING ] Could not write out jid file for job 20170316122026087326.
Retrying.
[WARNING ] Could not write out jid file for job 20170316122026087326.
Retrying.
[WARNING ] Could not write out jid file for job 20170316122026087326.
Retrying.
[WARNING ] Could not write out jid file for job 20170316122026087326.
Retrying.
[WARNING ] Could not write out jid file for job 20170316122026087326.
Retrying.
[ERROR   ] prep_jid could not store a jid after 5 tries.
[ERROR   ] Could not store job cache info. Job details for this run may
be unavailable.
[ERROR   ] Run failed on minions: ses4-3.local.site, ses4-4.local.site, 
ses4-1.local.site, ses4-2.local.site
Failures:
    ses4-3.local.site:
        Data failed to compile:
    ----------
        Rendering SLS 'base:ceph.osd.default' failed: Conflicting ID
'prepare /dev/vdb'
    ses4-4.local.site:
        Data failed to compile:
    ----------
        Rendering SLS 'base:ceph.osd.default' failed: Conflicting ID
'prepare /dev/vdb'
    ses4-1.local.site:
        Data failed to compile:
    ----------
        Rendering SLS 'base:ceph.osd.default' failed: Conflicting ID
'prepare /dev/vdb'
    ses4-2.local.site:
        Data failed to compile:
    ----------
        Rendering SLS 'base:ceph.osd.default' failed: Conflicting ID
'prepare /dev/vdb'

salt.local.site_master:
  Name: packages - Function: salt.state - Result: Clean Started: -
12:20:27.760260 Duration: 1259.428 ms
  Name: configuration check - Function: salt.state - Result: Clean
Started: - 12:20:29.019844 Duration: 177.061 ms
  Name: configuration - Function: salt.state - Result: Clean Started: -
12:20:29.197064 Duration: 598.674 ms
  Name: admin - Function: salt.state - Result: Clean Started: -
12:20:29.795890 Duration: 190.272 ms
  Name: monitors - Function: salt.state - Result: Changed Started: -
12:20:29.986315 Duration: 454.657 ms
  Name: osd auth - Function: salt.state - Result: Changed Started: -
12:20:30.441126 Duration: 332.438 ms
----------
          ID: storage
    Function: salt.state
      Result: False
     Comment: Run failed on minions: ses4-3.local.site, ses4-
4.local.site, ses4-1.local.site, ses4-2.local.site
              Failures:
                  ses4-3.local.site:
                      Data failed to compile:
                  ----------
                      Rendering SLS 'base:ceph.osd.default' failed:
Conflicting ID 'prepare /dev/vdb'
                  ses4-4.local.site:
                      Data failed to compile:
                  ----------
                      Rendering SLS 'base:ceph.osd.default' failed:
Conflicting ID 'prepare /dev/vdb'
                  ses4-1.local.site:
                      Data failed to compile:
                  ----------
                      Rendering SLS 'base:ceph.osd.default' failed:
Conflicting ID 'prepare /dev/vdb'
                  ses4-2.local.site:
                      Data failed to compile:
                  ----------
                      Rendering SLS 'base:ceph.osd.default' failed:
Conflicting ID 'prepare /dev/vdb'
     Started: 12:20:30.773721
    Duration: 396.396 ms
     Changes:

Summary for salt.local.site_master
------------
Succeeded: 6 (changed=2)
Failed:    1
------------
Total states run:     7
Total run time:   3.409 s

So I had to remove the old entries to get Salt to add OSDs to my
existing nodes.

I would have expected that I can simply add the new disks and Salt will
notice that some disks already exist and just set up the new ones.

Now I am wondering what will happen if I integrate a new identical OSD-
Node. policy.cfg would need to have both disk profiles.  pillar.items
would show duplicate disk entries for the existing nodes again and
stage.3 would fail.

Robert

On Thu, 2017-03-16 at 10:14 -0400, Eric Jackson wrote:
> On Thursday, March 16, 2017 12:55:49 PM Robert Grosschopff wrote:
> > 
> > *,
> > 
> > I added OSDs using Salt the following way:
> > 
> > - Add disks to system
> > - Run stage.1
> > - Modify policy.cfg
> > o add profile-NEWDISK/cluster/OSD*.sls
> > o add profile-NEWDISK/stack/default/ceph/minions/OSD*.yml
> > o REMOVE old profile-OLDDISK/cluster/OSD*.sls
> > o REMOVE old profile-OLDDISK/stack/default/ceph/minions /OSD*.yml
> > - Run stage.2
> > - Run stage.3
> > 
> > If the old profiles are not removed  'salt \* pillar.items' will
> > have add
> > the old OSD profiles again.
> > 
> > Is this the way it is supposed to be done ?
> 
> Since you modify policy.cfg to use the profile-NEWDISK, you do not
> need to 
> remove the old profiles.  However, if you have no machines that will
> ever match 
> them again and want to clean up, there's no harm.
> 
> Does the new profile contain all the disks as OSDs in the way you
> wanted?  If 
> so, do exactly what you did.  Stage 3 will see that the existing OSDs
> are 
> already done and move on to adding the blank drives as additional
> OSDs.
> 
> If the new profile is not a simple addition of the existing disks
> (maybe you 
> replaced smaller disks and added additional disks), then removing the
> node is 
> the simpler alternative.  That is,
> 
> 1) Remove/comment out the node from policy.cfg
> 2) Run Stages 2-5 
> 3) Add the node back with new profile
> 
> Depending on your situation, you can take that as fast or as slow as 
> necessary.  That is, do all the storage nodes you physically changed
> or do 
> them one at a time. 
> > 
> > 
> > Robert
> > 
> > _______________________________________________
> > Deepsea-users mailing list
> > Deepsea-users at lists.suse.com
> > http://lists.suse.com/mailman/listinfo/deepsea-users
> _______________________________________________
> Deepsea-users mailing list
> Deepsea-users at lists.suse.com
> http://lists.suse.com/mailman/listinfo/deepsea-users