From knighthoot at gmail.com  Mon Sep 10 07:19:26 2018
From: knighthoot at gmail.com (gna bla)
Date: Mon, 10 Sep 2018 15:19:26 +0200
Subject: [Deepsea-users] Bug with
	"/srv/salt/ceph/updates/restart/default.sls"
Message-ID: <CACS47M+t1fLwAy3sRRzfuENNOq6md=-qkHTNSktdXQxgBnGMPA@mail.gmail.com>

Hello everyone,

For a small teat project, I was asked to try out OpenAttic with Ceph.
Obviously I decided to use DeepSea, as the OA docs suggested. I ran into a
bug in the file /srv/salt/ceph/updates/restart/default.sls.
My setup was 3 server, have been following your readme.md and everything
was fine and dandy until I had to execute
# salt-run state.orch ceph.stage.0
It spew a couple errors and together with my boss, we searched for a
solution. The problem was the file mentioned above, specifically it had
problems on line 3 with the "rpm" command. The default code didn't work for
us, so we tried a workaround mentioned here:
https://github.com/saltstack/salt/issues/43569#issuecomment-330209788

With this workaround the third line looks like this:
{% set installed = salt['cmd.run']('/bin/sh -c "rpm -q --last
kernel-default |head -1 |cut -f1 -d\ "') | replace('kernel-default-', '') %}

With the new code, everything worked fine.

I honestly don't know what causes this or why it fails me on exactly there
but the workaround helped.

I didn't mention this above, but with the original code I got some error
messages hinting to "rpm: -1: unknown" something like that. Seems like the
program was able to find the kernel version but unable to parse it. I could
be wrong on this one though as I am not a developer :-)

Thank you for taking your time and reading this.

Kind regards,
MrPiano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180910/f92152c7/attachment.htm>

From kevin.ayres at suse.com  Tue Sep 18 16:49:07 2018
From: kevin.ayres at suse.com (Kevin Ayres)
Date: Tue, 18 Sep 2018 22:49:07 +0000
Subject: [Deepsea-users] stage 1 errors on Azure
Message-ID: <D27AFD4F-3E54-4196-9FAE-1D61AACEFBFA@suse.com>

Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully. Same output with deepsea command. The master and minion service are running and bidirectional host resolution are good. Keys are all accepted. From what I can determine, the default files are not created by stage 0 for some reason. Thoughts? What I?m seeing is that it fails to create the /srv/pillar/ceph/proposals

I?m running through this doc line by line: https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/book_storage_deployment/book_storage_deployment.html#deepsea.cli

~ Kevin


salt:~ # salt-run state.orch ceph.stage.discovery

salt-api                 : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"]

deepsea_minions          : valid

master_minion            : valid

ceph_version             : valid

[ERROR   ] No highstate or sls specified, no execution made

salt_master:

----------

          ID: salt-api failed

    Function: salt.state

        Name: just.exit

      Result: False

     Comment: No highstate or sls specified, no execution made

     Started: 22:30:53.628882

    Duration: 0.647 ms

     Changes:


Summary for salt_master

------------

Succeeded: 0

Failed:    1

------------

Total states run:     1

Total run time:   0.647 ms

salt:~ # !tail
tail -f /var/log/salt/master
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-igw/cluster/igw*.sls matched no files
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-openattic/cluster/salt.sls matched no files
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/global.yml matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/ceph/cluster.yml matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] cluster/*.sls matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] stack/default/ceph/minions/*.yml matched no files
2018-09-18 22:29:08,822 [salt.state       ][ERROR   ][8499] No highstate or sls specified, no execution made
2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR   ][5672] Exception occurred while handling stream: [Errno 0] Success
2018-09-18 22:29:56,797 [salt.state       ][ERROR   ][8759] No highstate or sls specified, no execution made
2018-09-18 22:30:53,629 [salt.state       ][ERROR   ][9272] No highstate or sls specified, no execution made


There?s also some issue with the salt-minion.service:
? salt-minion.service - The Salt Minion
   Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-09-18 22:46:54 UTC; 12s ago
 Main PID: 11082 (salt-minion)
?
.....
Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.
Sep 18 22:47:00 salt salt-minion[11082]: [ERROR   ] Function cephimages.list in mine_functions failed to execute


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180918/c1962e56/attachment.htm>

From joel.zhou at suse.com  Wed Sep 19 00:35:04 2018
From: joel.zhou at suse.com (Joel Zhou)
Date: Wed, 19 Sep 2018 06:35:04 +0000
Subject: [Deepsea-users] stage 1 errors on Azure
Message-ID: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>

Hi Kevin,

My short answer is,

Step 1, before stage 0, check your salt-api service on salt-master node first.
```bash
zypper install -y salt-api
systemctl enable salt-api.service
systemctl start salt-api.service
```
Step 2, make sure NTP service works correctly on all nodes, which means time synchronized correctly on all nodes.
Step 3, reboot all your nodes, if acceptable. In case of kernel updated somehow.
Step 4, then you have to start over again from stage 0 to 5.

Basically, deepsea is a bunch of salt scripts, and salt based on python2 and/or python3.
I have no clues about your whole running stack, so assume SLES 12 sp3 + SES 5, which works fine and supported.
More info would be helpful, and also your purpose, such as for practice on your own, or for PoC/testing to meet customer?s demands.

Regards,

--
Joel Zhou ???
Senior Storage Technologist, APJ

Mobile: +86 18514577601
Email: joel.zhou at suse.com

From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres <kevin.ayres at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Tuesday, September 18, 2018 at 4:49 PM
To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
Subject: [Deepsea-users] stage 1 errors on Azure

Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully. Same output with deepsea command. The master and minion service are running and bidirectional host resolution are good. Keys are all accepted. From what I can determine, the default files are not created by stage 0 for some reason. Thoughts? What I?m seeing is that it fails to create the /srv/pillar/ceph/proposals

I?m running through this doc line by line: https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/book_storage_deployment/book_storage_deployment.html#deepsea.cli

~ Kevin


salt:~ # salt-run state.orch ceph.stage.discovery

salt-api                 : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"]

deepsea_minions          : valid

master_minion            : valid

ceph_version             : valid

[ERROR   ] No highstate or sls specified, no execution made

salt_master:

----------

          ID: salt-api failed

    Function: salt.state

        Name: just.exit

      Result: False

     Comment: No highstate or sls specified, no execution made

     Started: 22:30:53.628882

    Duration: 0.647 ms

     Changes:


Summary for salt_master

------------

Succeeded: 0

Failed:    1

------------

Total states run:     1

Total run time:   0.647 ms

salt:~ # !tail
tail -f /var/log/salt/master
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-igw/cluster/igw*.sls matched no files
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-openattic/cluster/salt.sls matched no files
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/global.yml matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/ceph/cluster.yml matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] cluster/*.sls matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] stack/default/ceph/minions/*.yml matched no files
2018-09-18 22:29:08,822 [salt.state       ][ERROR   ][8499] No highstate or sls specified, no execution made
2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR   ][5672] Exception occurred while handling stream: [Errno 0] Success
2018-09-18 22:29:56,797 [salt.state       ][ERROR   ][8759] No highstate or sls specified, no execution made
2018-09-18 22:30:53,629 [salt.state       ][ERROR   ][9272] No highstate or sls specified, no execution made


There?s also some issue with the salt-minion.service:
? salt-minion.service - The Salt Minion
   Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-09-18 22:46:54 UTC; 12s ago
 Main PID: 11082 (salt-minion)
?
.....
Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.
Sep 18 22:47:00 salt salt-minion[11082]: [ERROR   ] Function cephimages.list in mine_functions failed to execute


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/ff83c7ef/attachment.htm>

From kevin.ayres at suse.com  Wed Sep 19 13:20:43 2018
From: kevin.ayres at suse.com (Kevin Ayres)
Date: Wed, 19 Sep 2018 19:20:43 +0000
Subject: [Deepsea-users] stage 1 errors on Azure
In-Reply-To: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>
References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>
Message-ID: <F499BA00-9DF0-4787-89BC-EF4C865F44BA@suse.com>

Thanks Joel, yes DNS, NTP is configured and behaving correctly. SP3/SES5 from current repo. salt-api service, master, minion service running (with one error.)
I?m walking through the Deployment guide line by line with same result, now on my second freshly built master node. Salt output is at the bottom of this message. Key: After stage 0, the */proposals directory has NOT been created.

Here?s my build on a single flat network(Azure vNet 172.19.20.0/24):
Root ssh enabled and key based login from master to all nodes as root. All nodes rebooted before salt stage.
All nodes using identical image and fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - the Azure instance defaults.

Salt (and all nodes):~ # zypper lr -E
Repository priorities are without effect. All enabled repositories share the same priority.
#  | Alias                                                              | Name                              | Enabled | GPG Check | Refresh
---+--------------------------------------------------------------------+-----------------------------------+---------+-----------+--------
  3 | SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool    | SUSE-Enterprise-Storage-5-Pool    | Yes     | (r ) Yes  | No
  5 | SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates | SUSE-Enterprise-Storage-5-Updates | Yes     | (r ) Yes  | Yes
  8 | SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool         | SLES12-SP3-Pool                   | Yes     | (r ) Yes  | No
 10 | SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates      | SLES12-SP3-Updates                | Yes     | (r ) Yes  | Yes

**DNS** all nodes resolve bidirectionally. Azure cares for DNS but I?ve also updated hosts files.
salt:~ # hostname
salt
salt:~ # ping salt
PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1) 56(84) bytes of data.
64 bytes from salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1): icmp_seq=1 ttl=64 time=0.030 ms

104.211.27.224 Outside NAT to 172.19.20.10    salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt
172.19.20.12    mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1
172.19.20.13    mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon2
172.19.20.14      mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3
172.19.20.15      osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1
172.19.20.16      osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd2
172.19.20.17      osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3
172.19.20.18      igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1

**NTP** all nodes 5 minutes sync interval to same Stratum 1 server in same GEO as Azure AZ: (US East) navobs1.gatech.edu as shown:
bash-3.2$ pssh -h pssh-hosts -l sesuser -i sudo ntpq -p
[1] 11:15:27 [SUCCESS] mon3
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*navobs1.gatech. .GPS.            1 u   19   64    1   15.596   -4.863   0.333
[2] 11:15:27 [SUCCESS] salt
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
navobs1.gatech. .GPS.            1 u   42   64    1   17.063   -6.702   0.000
[3] 11:15:27 [SUCCESS] igw1
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*navobs1.gatech. .GPS.            1 u   18   64    1   17.394  -27.874   7.663
[4] 11:15:27 [SUCCESS] osd1
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*navobs1.gatech. .GPS.            1 u   21   64    1   16.962   -3.755   0.813
[5] 11:15:27 [SUCCESS] osd2
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*navobs1.gatech. .GPS.            1 u   22   64    1   15.832   -4.709   3.062
[6] 11:15:27 [SUCCESS] osd3
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*navobs1.gatech. .GPS.            1 u   26   64    1   15.877   -3.252  19.131
[7] 11:15:27 [SUCCESS] mon1
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
navobs1.gatech. .GPS.            1 u    2   64    1   16.120   -4.263   0.000
[8] 11:15:27 [SUCCESS] mon2
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
navobs1.gatech. .GPS.            1 u    2   64    1   16.108   -7.713   0.959


**SALT**
salt:~ # systemctl status salt-api salt-master salt-minion |grep 'active (running)'
   Active: active (running) since Wed 2018-09-19 18:14:39 UTC; 13min ago
   Active: active (running) since Wed 2018-09-19 18:14:42 UTC; 13min ago
   Active: active (running) since Wed 2018-09-19 18:14:41 UTC; 13min ago

salt:~ # systemctl status salt-api salt-master salt-minion |grep ERROR
   Sep 19 18:14:49 salt salt-minion[1413]: [ERROR   ] Function cephimages.list in mine_functions failed to execute

salt:~ # salt-key --list-all
   Accepted Keys:
   igw1
   mon1
   mon2
   mon3
   osd1
   osd2
   osd3
   salt
   Denied Keys:
   Unaccepted Keys:
   Rejected Keys:

salt:~ # salt '*' test.ping
salt:
    True
osd2:
    True
mon3:
    True
osd3:
    True
osd1:
    True
mon2:
    True
igw1:
    True
mon1:
    True

salt:~ # cat /srv/pillar/ceph/master_minion.sls
master_minion: salt

salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls
...
# Choose all minions
deepsea_minions: '*'
...

**SALT STAGES**
Stage 0 is successful with no errors but does not create the proposals folder.

salt:~ # salt-run state.orch ceph.stage.prep
    deepsea_minions          : valid
    master_minion            : valid
    ceph_version             : valid
    [WARNING ] All minions are ready
    salt_master:
      Name: sync master - Function: salt.state - Result: Changed Started: - 18:44:20.440255 Duration: 949.98 ms
      Name: salt-api - Function: salt.state - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms
      Name: repo master - Function: salt.state - Result: Clean Started: - 18:44:24.647227 Duration: 351.0 ms
      Name: metapackage master - Function: salt.state - Result: Clean Started: - 18:44:24.998333 Duration: 1127.063 ms
      Name: prepare master - Function: salt.state - Result: Changed Started: - 18:44:26.125514 Duration: 4109.917 ms
      Name: filequeue.remove - Function: salt.runner - Result: Changed Started: - 18:44:30.235610 Duration: 2071.199 ms
      Name: restart master - Function: salt.state - Result: Clean Started: - 18:44:32.306972 Duration: 1006.268 ms
      Name: filequeue.add - Function: salt.runner - Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms
      Name: minions.ready - Function: salt.runner - Result: Changed Started: - 18:44:34.666528 Duration: 1891.677 ms
      Name: repo - Function: salt.state - Result: Clean Started: - 18:44:36.558363 Duration: 553.342 ms
      Name: metapackage minions - Function: salt.state - Result: Clean Started: - 18:44:37.111825 Duration: 3993.733 ms
      Name: common packages - Function: salt.state - Result: Clean Started: - 18:44:41.105706 Duration: 2434.079 ms
      Name: sync - Function: salt.state - Result: Changed Started: - 18:44:43.539897 Duration: 1381.692 ms
      Name: mines - Function: salt.state - Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms
      Name: updates - Function: salt.state - Result: Changed Started: - 18:44:46.578853 Duration: 11183.347 ms
      Name: restart - Function: salt.state - Result: Clean Started: - 18:44:57.762346 Duration: 1553.957 ms
      Name: mds restart noop - Function: test.nop - Result: Clean Started: - 18:44:59.316442 Duration: 0.348 ms

    Summary for salt_master
    -------------
    Succeeded: 17 (changed=8)
    Failed:     0
    -------------
    Total states run:     17
    Total run time:   38.874 s


Before running Stage 1, the /srv/pillar/ceph/proposals directory does not exist.
salt:~ # ls /srv/pillar/ceph/proposals/
    ls: cannot access '/srv/pillar/ceph/proposals/': No such file or directory

That?s where I?m at ? Googling..

~ Kevin

From: <deepsea-users-bounces at lists.suse.com> on behalf of Joel Zhou <joel.zhou at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Tuesday, September 18, 2018 at 11:34 PM
To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] stage 1 errors on Azure

Hi Kevin,

My short answer is,

Step 1, before stage 0, check your salt-api service on salt-master node first.
```bash
zypper install -y salt-api
systemctl enable salt-api.service
systemctl start salt-api.service
```
Step 2, make sure NTP service works correctly on all nodes, which means time synchronized correctly on all nodes.
Step 3, reboot all your nodes, if acceptable. In case of kernel updated somehow.
Step 4, then you have to start over again from stage 0 to 5.

Basically, deepsea is a bunch of salt scripts, and salt based on python2 and/or python3.
I have no clues about your whole running stack, so assume SLES 12 sp3 + SES 5, which works fine and supported.
More info would be helpful, and also your purpose, such as for practice on your own, or for PoC/testing to meet customer?s demands.

Regards,

--
Joel Zhou ???
Senior Storage Technologist, APJ

Mobile: +86 18514577601
Email: joel.zhou at suse.com

From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres <kevin.ayres at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Tuesday, September 18, 2018 at 4:49 PM
To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
Subject: [Deepsea-users] stage 1 errors on Azure

Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully. Same output with deepsea command. The master and minion service are running and bidirectional host resolution are good. Keys are all accepted. From what I can determine, the default files are not created by stage 0 for some reason. Thoughts? What I?m seeing is that it fails to create the /srv/pillar/ceph/proposals

I?m running through this doc line by line: https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/book_storage_deployment/book_storage_deployment.html#deepsea.cli

~ Kevin


salt:~ # salt-run state.orch ceph.stage.discovery

salt-api                 : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"]

deepsea_minions          : valid

master_minion            : valid

ceph_version             : valid

[ERROR   ] No highstate or sls specified, no execution made

salt_master:

----------

          ID: salt-api failed

    Function: salt.state

        Name: just.exit

      Result: False

     Comment: No highstate or sls specified, no execution made

     Started: 22:30:53.628882

    Duration: 0.647 ms

     Changes:


Summary for salt_master

------------

Succeeded: 0

Failed:    1

------------

Total states run:     1

Total run time:   0.647 ms

salt:~ # !tail
tail -f /var/log/salt/master
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-igw/cluster/igw*.sls matched no files
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-openattic/cluster/salt.sls matched no files
2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/global.yml matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/ceph/cluster.yml matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] cluster/*.sls matched no files
2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] stack/default/ceph/minions/*.yml matched no files
2018-09-18 22:29:08,822 [salt.state       ][ERROR   ][8499] No highstate or sls specified, no execution made
2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR   ][5672] Exception occurred while handling stream: [Errno 0] Success
2018-09-18 22:29:56,797 [salt.state       ][ERROR   ][8759] No highstate or sls specified, no execution made
2018-09-18 22:30:53,629 [salt.state       ][ERROR   ][9272] No highstate or sls specified, no execution made


There?s also some issue with the salt-minion.service:
? salt-minion.service - The Salt Minion
   Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-09-18 22:46:54 UTC; 12s ago
 Main PID: 11082 (salt-minion)
?
.....
Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.
Sep 18 22:47:00 salt salt-minion[11082]: [ERROR   ] Function cephimages.list in mine_functions failed to execute


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/cd1ffc72/attachment.htm>

From ejackson at suse.com  Wed Sep 19 13:37:00 2018
From: ejackson at suse.com (Eric Jackson)
Date: Wed, 19 Sep 2018 15:37:00 -0400
Subject: [Deepsea-users] stage 1 errors on Azure
In-Reply-To: <F499BA00-9DF0-4787-89BC-EF4C865F44BA@suse.com>
References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>
	<F499BA00-9DF0-4787-89BC-EF4C865F44BA@suse.com>
Message-ID: <1624041.v2sRp242nD@fury.home>

Hi Kevin,
  Stage 0 only does the "preparation" part.  That is, sync'ing salt modules, 
zypper updates, etc.  Stage 1 is the "discovery" part that interrogates the 
minions and then creates the roles and storage fragments.  If your salt-api 
issue is resolved, Stage 1 should run relatively quick.  

Eric

On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote:
> Thanks Joel, yes DNS, NTP is configured and behaving correctly. SP3/SES5
> from current repo. salt-api service, master, minion service running (with
> one error.)
 I?m walking through the Deployment guide line by line with
> same result, now on my second freshly built master node. Salt output is at
> the bottom of this message. Key: After stage 0, the */proposals directory
> has NOT been created. 
> Here?s my build on a single flat network(Azure vNet 172.19.20.0/24):
> Root ssh enabled and key based login from master to all nodes as root. All
> nodes rebooted before salt stage.
 All nodes using identical image and
> fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - the
> Azure instance defaults. 
> Salt (and all nodes):~ # zypper lr -E
> Repository priorities are without effect. All enabled repositories share the
> same priority.
 #  | Alias                                                 
>             | Name                              | Enabled | GPG Check |
> Refresh
> ---+--------------------------------------------------------------------+--
> ---------------------------------+---------+-----------+-------- 3 |
> SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool    |
> SUSE-Enterprise-Storage-5-Pool    | Yes     | (r ) Yes  | No 5 |
> SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates |
> SUSE-Enterprise-Storage-5-Updates | Yes     | (r ) Yes  | Yes 8 |
> SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool         |
> SLES12-SP3-Pool                   | Yes     | (r ) Yes  | No 10 |
> SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates      |
> SLES12-SP3-Updates                | Yes     | (r ) Yes  | Yes 
> **DNS** all nodes resolve bidirectionally. Azure cares for DNS but I?ve also
> updated hosts files.
 salt:~ # hostname
> salt
> salt:~ # ping salt
> PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1)
> 56(84) bytes of data.
 64 bytes from
> salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1):
> icmp_seq=1 ttl=64 time=0.030 ms 
> 104.211.27.224 Outside NAT to 172.19.20.10   
> salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt
 172.19.20.12
>    mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1
> 172.19.20.13    mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net
> mon2 172.19.20.14     
> mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3 172.19.20.15 
>     osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1
> 172.19.20.16      osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net
> osd2 172.19.20.17     
> osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3 172.19.20.18 
>     igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1 
> **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in same
> GEO as Azure AZ: (US East) navobs1.gatech.edu as shown:
 bash-3.2$ pssh -h
> pssh-hosts -l sesuser -i sudo ntpq -p
> [1] 11:15:27 [SUCCESS] mon3
>      remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
> === *navobs1.gatech. .GPS.            1 u   19   64    1   15.596   -4.863  
> 0.333 [2] 11:15:27 [SUCCESS] salt
>      remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
> === navobs1.gatech. .GPS.            1 u   42   64    1   17.063   -6.702  
> 0.000 [3] 11:15:27 [SUCCESS] igw1
>      remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
> === *navobs1.gatech. .GPS.            1 u   18   64    1   17.394  -27.874  
> 7.663 [4] 11:15:27 [SUCCESS] osd1
>      remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
> === *navobs1.gatech. .GPS.            1 u   21   64    1   16.962   -3.755  
> 0.813 [5] 11:15:27 [SUCCESS] osd2
>      remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
> === *navobs1.gatech. .GPS.            1 u   22   64    1   15.832   -4.709  
> 3.062 [6] 11:15:27 [SUCCESS] osd3
>      remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
> === *navobs1.gatech. .GPS.            1 u   26   64    1   15.877   -3.252 
> 19.131 [7] 11:15:27 [SUCCESS] mon1
>      remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
> === navobs1.gatech. .GPS.            1 u    2   64    1   16.120   -4.263  
> 0.000 [8] 11:15:27 [SUCCESS] mon2
>      remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
> === navobs1.gatech. .GPS.            1 u    2   64    1   16.108   -7.713  
> 0.959 
> 
> **SALT**
> salt:~ # systemctl status salt-api salt-master salt-minion |grep 'active
> (running)'
 Active: active (running) since Wed 2018-09-19 18:14:39 UTC;
> 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC; 13min
> ago Active: active (running) since Wed 2018-09-19 18:14:41 UTC; 13min ago 
> salt:~ # systemctl status salt-api salt-master salt-minion |grep ERROR
>    Sep 19 18:14:49 salt salt-minion[1413]: [ERROR   ] Function
> cephimages.list in mine_functions failed to execute
 
> salt:~ # salt-key --list-all
>    Accepted Keys:
>    igw1
>    mon1
>    mon2
>    mon3
>    osd1
>    osd2
>    osd3
>    salt
>    Denied Keys:
>    Unaccepted Keys:
>    Rejected Keys:
> 
> salt:~ # salt '*' test.ping
> salt:
>     True
> osd2:
>     True
> mon3:
>     True
> osd3:
>     True
> osd1:
>     True
> mon2:
>     True
> igw1:
>     True
> mon1:
>     True
> 
> salt:~ # cat /srv/pillar/ceph/master_minion.sls
> master_minion: salt
> 
> salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls
> ...
> # Choose all minions
> deepsea_minions: '*'
> ...
> 
> **SALT STAGES**
> Stage 0 is successful with no errors but does not create the proposals
> folder.
 
> salt:~ # salt-run state.orch ceph.stage.prep
>     deepsea_minions          : valid
>     master_minion            : valid
>     ceph_version             : valid
>     [WARNING ] All minions are ready
>     salt_master:
>       Name: sync master - Function: salt.state - Result: Changed Started: -
> 18:44:20.440255 Duration: 949.98 ms
 Name: salt-api - Function: salt.state
> - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms Name:
> repo master - Function: salt.state - Result: Clean Started: -
> 18:44:24.647227 Duration: 351.0 ms Name: metapackage master - Function:
> salt.state - Result: Clean Started: - 18:44:24.998333 Duration: 1127.063 ms
> Name: prepare master - Function: salt.state - Result: Changed Started: -
> 18:44:26.125514 Duration: 4109.917 ms Name: filequeue.remove - Function:
> salt.runner - Result: Changed Started: - 18:44:30.235610 Duration: 2071.199
> ms Name: restart master - Function: salt.state - Result: Clean Started: -
> 18:44:32.306972 Duration: 1006.268 ms Name: filequeue.add - Function:
> salt.runner - Result: Changed Started: - 18:44:33.313369 Duration: 1352.98
> ms Name: minions.ready - Function: salt.runner - Result: Changed Started: -
> 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function: salt.state -
> Result: Clean Started: - 18:44:36.558363 Duration: 553.342 ms Name:
> metapackage minions - Function: salt.state - Result: Clean Started: -
> 18:44:37.111825 Duration: 3993.733 ms Name: common packages - Function:
> salt.state - Result: Clean Started: - 18:44:41.105706 Duration: 2434.079 ms
> Name: sync - Function: salt.state - Result: Changed Started: -
> 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function: salt.state -
> Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name:
> updates - Function: salt.state - Result: Changed Started: - 18:44:46.578853
> Duration: 11183.347 ms Name: restart - Function: salt.state - Result: Clean
> Started: - 18:44:57.762346 Duration: 1553.957 ms Name: mds restart noop -
> Function: test.nop - Result: Clean Started: - 18:44:59.316442 Duration:
> 0.348 ms 
>     Summary for salt_master
>     -------------
>     Succeeded: 17 (changed=8)
>     Failed:     0
>     -------------
>     Total states run:     17
>     Total run time:   38.874 s
> 
> 
> Before running Stage 1, the /srv/pillar/ceph/proposals directory does not
> exist.
 salt:~ # ls /srv/pillar/ceph/proposals/
>     ls: cannot access '/srv/pillar/ceph/proposals/': No such file or
> directory
 
> That?s where I?m at ? Googling..
> 
> ~ Kevin
> 
> From: <deepsea-users-bounces at lists.suse.com> on behalf of Joel Zhou
> <joel.zhou at suse.com>
 Reply-To: Discussions about the DeepSea management
> framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday, September
> 18, 2018 at 11:34 PM
> To: Discussions about the DeepSea management framework for Ceph
> <deepsea-users at lists.suse.com>
 Subject: Re: [Deepsea-users] stage 1 errors
> on Azure
> 
> Hi Kevin,
> 
> My short answer is,
> 
> Step 1, before stage 0, check your salt-api service on salt-master node
> first.
 ```bash
> zypper install -y salt-api
> systemctl enable salt-api.service
> systemctl start salt-api.service
> ```
> Step 2, make sure NTP service works correctly on all nodes, which means time
> synchronized correctly on all nodes.
 Step 3, reboot all your nodes, if
> acceptable. In case of kernel updated somehow. Step 4, then you have to
> start over again from stage 0 to 5.
> 
> Basically, deepsea is a bunch of salt scripts, and salt based on python2
> and/or python3.
 I have no clues about your whole running stack, so assume
> SLES 12 sp3 + SES 5, which works fine and supported. More info would be
> helpful, and also your purpose, such as for practice on your own, or for
> PoC/testing to meet customer?s demands. 
> Regards,
> 
> --
> Joel Zhou ???
> Senior Storage Technologist, APJ
> 
> Mobile: +86 18514577601
> Email: joel.zhou at suse.com
> 
> From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres
> <kevin.ayres at suse.com>
 Reply-To: Discussions about the DeepSea management
> framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday, September
> 18, 2018 at 4:49 PM
> To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
> Subject: [Deepsea-users] stage 1 errors on Azure
> 
> Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully.
> Same output with deepsea command. The master and minion service are running
> and bidirectional host resolution are good. Keys are all accepted. From
> what I can determine, the default files are not created by stage 0 for some
> reason. Thoughts? What I?m seeing is that it fails to create the
> /srv/pillar/ceph/proposals
 
> I?m running through this doc line by line:
> https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/boo
> k_storage_deployment/book_storage_deployment.html#deepsea.cli
 
> ~ Kevin
> 
> 
> salt:~ # salt-run state.orch ceph.stage.discovery
> 
> salt-api                 : ["Salt API is failing to authenticate - try
> 'systemctl restart salt-master': list index out of range"]
 
> deepsea_minions          : valid
> 
> master_minion            : valid
> 
> ceph_version             : valid
> 
> [ERROR   ] No highstate or sls specified, no execution made
> 
> salt_master:
> 
> ----------
> 
>           ID: salt-api failed
> 
>     Function: salt.state
> 
>         Name: just.exit
> 
>       Result: False
> 
>      Comment: No highstate or sls specified, no execution made
> 
>      Started: 22:30:53.628882
> 
>     Duration: 0.647 ms
> 
>      Changes:
> 
> 
> 
> Summary for salt_master
> 
> ------------
> 
> Succeeded: 0
> 
> Failed:    1
> 
> ------------
> 
> Total states run:     1
> 
> Total run time:   0.647 ms
> 
> salt:~ # !tail
> tail -f /var/log/salt/master
> 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499]
> role-igw/cluster/igw*.sls matched no files
 2018-09-18 22:29:08,797
> [salt.loaded.ext.runners.validate][WARNING ][8499]
> role-openattic/cluster/salt.sls matched no files 2018-09-18 22:29:08,797
> [salt.loaded.ext.runners.validate][WARNING ][8499]
> config/stack/default/global.yml matched no files 2018-09-18 22:29:08,798
> [salt.loaded.ext.runners.validate][WARNING ][8499]
> config/stack/default/ceph/cluster.yml matched no files 2018-09-18
> 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499]
> cluster/*.sls matched no files 2018-09-18 22:29:08,798
> [salt.loaded.ext.runners.validate][WARNING ][8499]
> stack/default/ceph/minions/*.yml matched no files 2018-09-18 22:29:08,822
> [salt.state       ][ERROR   ][8499] No highstate or sls specified, no
> execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR   ][5672]
> Exception occurred while handling stream: [Errno 0] Success 2018-09-18
> 22:29:56,797 [salt.state       ][ERROR   ][8759] No highstate or sls
> specified, no execution made 2018-09-18 22:30:53,629 [salt.state      
> ][ERROR   ][9272] No highstate or sls specified, no execution made 
> 
> There?s also some issue with the salt-minion.service:
> ? salt-minion.service - The Salt Minion
>    Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled;
> vendor preset: disabled)
 Active: active (running) since Tue 2018-09-18
> 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion)
> ?
> .....
> Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.
> Sep 18 22:47:00 salt salt-minion[11082]: [ERROR   ] Function cephimages.list
> in mine_functions failed to execute
 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/9bd245d4/attachment.sig>

From kevin.ayres at suse.com  Wed Sep 19 13:56:17 2018
From: kevin.ayres at suse.com (Kevin Ayres)
Date: Wed, 19 Sep 2018 19:56:17 +0000
Subject: [Deepsea-users] stage 1 errors on Azure
In-Reply-To: <1624041.v2sRp242nD@fury.home>
References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>
	<F499BA00-9DF0-4787-89BC-EF4C865F44BA@suse.com>
	<1624041.v2sRp242nD@fury.home>
Message-ID: <33994725-E346-4599-842A-042E5DBFA138@suse.com>

Thanks Eric, Yes, I understand this but worded it poorly. I don't see any issues with NTP or DNS. Something else is amiss. 
Should deepsea be installed after salt as outlined in the deployment doc, or before? 

salt:~ # salt-run state.orch ceph.stage.discovery
salt-api                 : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"]
deepsea_minions          : valid
master_minion            : valid
ceph_version             : valid
[ERROR   ] No highstate or sls specified, no execution made
salt_master:
----------
          ID: salt-api failed
    Function: salt.state
        Name: just.exit
      Result: False
     Comment: No highstate or sls specified, no execution made
     Started: 19:38:41.962044
    Duration: 0.734 ms
     Changes:   

Summary for salt_master
------------
Succeeded: 0
Failed:    1
------------
Total states run:     1
Total run time:   0.734 ms


salt:~ # tail -f /var/log/salt/master
2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319] All minions are ready
2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR   ][1626] Exception occurred while handling stream: [Errno 0] Success
2018-09-19 19:38:41,962 [salt.state       ][ERROR   ][40826] No highstate or sls specified, no execution made

salt:~ # ls /srv/pillar/ceph/proposals
ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory

salt:~ # ls /srv/pillar/ceph/
benchmarks  deepsea_minions.sls  deepsea_minions.sls.rpmsave  init.sls	master_minion.sls  master_minion.sls.rpmsave  stack


~ Kevin

?On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of Eric Jackson" <deepsea-users-bounces at lists.suse.com on behalf of ejackson at suse.com> wrote:

    Hi Kevin,
      Stage 0 only does the "preparation" part.  That is, sync'ing salt modules, 
    zypper updates, etc.  Stage 1 is the "discovery" part that interrogates the 
    minions and then creates the roles and storage fragments.  If your salt-api 
    issue is resolved, Stage 1 should run relatively quick.  
    
    Eric
    
    On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote:
    > Thanks Joel, yes DNS, NTP is configured and behaving correctly. SP3/SES5
    > from current repo. salt-api service, master, minion service running (with
    > one error.)
     I?m walking through the Deployment guide line by line with
    > same result, now on my second freshly built master node. Salt output is at
    > the bottom of this message. Key: After stage 0, the */proposals directory
    > has NOT been created. 
    > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24):
    > Root ssh enabled and key based login from master to all nodes as root. All
    > nodes rebooted before salt stage.
     All nodes using identical image and
    > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - the
    > Azure instance defaults. 
    > Salt (and all nodes):~ # zypper lr -E
    > Repository priorities are without effect. All enabled repositories share the
    > same priority.
     #  | Alias                                                 
    >             | Name                              | Enabled | GPG Check |
    > Refresh
    > ---+--------------------------------------------------------------------+--
    > ---------------------------------+---------+-----------+-------- 3 |
    > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool    |
    > SUSE-Enterprise-Storage-5-Pool    | Yes     | (r ) Yes  | No 5 |
    > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates |
    > SUSE-Enterprise-Storage-5-Updates | Yes     | (r ) Yes  | Yes 8 |
    > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool         |
    > SLES12-SP3-Pool                   | Yes     | (r ) Yes  | No 10 |
    > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates      |
    > SLES12-SP3-Updates                | Yes     | (r ) Yes  | Yes 
    > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but I?ve also
    > updated hosts files.
     salt:~ # hostname
    > salt
    > salt:~ # ping salt
    > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1)
    > 56(84) bytes of data.
     64 bytes from
    > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1):
    > icmp_seq=1 ttl=64 time=0.030 ms 
    > 104.211.27.224 Outside NAT to 172.19.20.10   
    > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt
     172.19.20.12
    >    mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1
    > 172.19.20.13    mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net
    > mon2 172.19.20.14     
    > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3 172.19.20.15 
    >     osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1
    > 172.19.20.16      osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net
    > osd2 172.19.20.17     
    > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3 172.19.20.18 
    >     igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1 
    > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in same
    > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown:
     bash-3.2$ pssh -h
    > pssh-hosts -l sesuser -i sudo ntpq -p
    > [1] 11:15:27 [SUCCESS] mon3
    >      remote           refid      st t when poll reach   delay   offset 
    > jitter
    > ===========================================================================
    > === *navobs1.gatech. .GPS.            1 u   19   64    1   15.596   -4.863  
    > 0.333 [2] 11:15:27 [SUCCESS] salt
    >      remote           refid      st t when poll reach   delay   offset 
    > jitter
    > ===========================================================================
    > === navobs1.gatech. .GPS.            1 u   42   64    1   17.063   -6.702  
    > 0.000 [3] 11:15:27 [SUCCESS] igw1
    >      remote           refid      st t when poll reach   delay   offset 
    > jitter
    > ===========================================================================
    > === *navobs1.gatech. .GPS.            1 u   18   64    1   17.394  -27.874  
    > 7.663 [4] 11:15:27 [SUCCESS] osd1
    >      remote           refid      st t when poll reach   delay   offset 
    > jitter
    > ===========================================================================
    > === *navobs1.gatech. .GPS.            1 u   21   64    1   16.962   -3.755  
    > 0.813 [5] 11:15:27 [SUCCESS] osd2
    >      remote           refid      st t when poll reach   delay   offset 
    > jitter
    > ===========================================================================
    > === *navobs1.gatech. .GPS.            1 u   22   64    1   15.832   -4.709  
    > 3.062 [6] 11:15:27 [SUCCESS] osd3
    >      remote           refid      st t when poll reach   delay   offset 
    > jitter
    > ===========================================================================
    > === *navobs1.gatech. .GPS.            1 u   26   64    1   15.877   -3.252 
    > 19.131 [7] 11:15:27 [SUCCESS] mon1
    >      remote           refid      st t when poll reach   delay   offset 
    > jitter
    > ===========================================================================
    > === navobs1.gatech. .GPS.            1 u    2   64    1   16.120   -4.263  
    > 0.000 [8] 11:15:27 [SUCCESS] mon2
    >      remote           refid      st t when poll reach   delay   offset 
    > jitter
    > ===========================================================================
    > === navobs1.gatech. .GPS.            1 u    2   64    1   16.108   -7.713  
    > 0.959 
    > 
    > **SALT**
    > salt:~ # systemctl status salt-api salt-master salt-minion |grep 'active
    > (running)'
     Active: active (running) since Wed 2018-09-19 18:14:39 UTC;
    > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC; 13min
    > ago Active: active (running) since Wed 2018-09-19 18:14:41 UTC; 13min ago 
    > salt:~ # systemctl status salt-api salt-master salt-minion |grep ERROR
    >    Sep 19 18:14:49 salt salt-minion[1413]: [ERROR   ] Function
    > cephimages.list in mine_functions failed to execute
     
    > salt:~ # salt-key --list-all
    >    Accepted Keys:
    >    igw1
    >    mon1
    >    mon2
    >    mon3
    >    osd1
    >    osd2
    >    osd3
    >    salt
    >    Denied Keys:
    >    Unaccepted Keys:
    >    Rejected Keys:
    > 
    > salt:~ # salt '*' test.ping
    > salt:
    >     True
    > osd2:
    >     True
    > mon3:
    >     True
    > osd3:
    >     True
    > osd1:
    >     True
    > mon2:
    >     True
    > igw1:
    >     True
    > mon1:
    >     True
    > 
    > salt:~ # cat /srv/pillar/ceph/master_minion.sls
    > master_minion: salt
    > 
    > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls
    > ...
    > # Choose all minions
    > deepsea_minions: '*'
    > ...
    > 
    > **SALT STAGES**
    > Stage 0 is successful with no errors but does not create the proposals
    > folder.
     
    > salt:~ # salt-run state.orch ceph.stage.prep
    >     deepsea_minions          : valid
    >     master_minion            : valid
    >     ceph_version             : valid
    >     [WARNING ] All minions are ready
    >     salt_master:
    >       Name: sync master - Function: salt.state - Result: Changed Started: -
    > 18:44:20.440255 Duration: 949.98 ms
     Name: salt-api - Function: salt.state
    > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms Name:
    > repo master - Function: salt.state - Result: Clean Started: -
    > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master - Function:
    > salt.state - Result: Clean Started: - 18:44:24.998333 Duration: 1127.063 ms
    > Name: prepare master - Function: salt.state - Result: Changed Started: -
    > 18:44:26.125514 Duration: 4109.917 ms Name: filequeue.remove - Function:
    > salt.runner - Result: Changed Started: - 18:44:30.235610 Duration: 2071.199
    > ms Name: restart master - Function: salt.state - Result: Clean Started: -
    > 18:44:32.306972 Duration: 1006.268 ms Name: filequeue.add - Function:
    > salt.runner - Result: Changed Started: - 18:44:33.313369 Duration: 1352.98
    > ms Name: minions.ready - Function: salt.runner - Result: Changed Started: -
    > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function: salt.state -
    > Result: Clean Started: - 18:44:36.558363 Duration: 553.342 ms Name:
    > metapackage minions - Function: salt.state - Result: Clean Started: -
    > 18:44:37.111825 Duration: 3993.733 ms Name: common packages - Function:
    > salt.state - Result: Clean Started: - 18:44:41.105706 Duration: 2434.079 ms
    > Name: sync - Function: salt.state - Result: Changed Started: -
    > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function: salt.state -
    > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name:
    > updates - Function: salt.state - Result: Changed Started: - 18:44:46.578853
    > Duration: 11183.347 ms Name: restart - Function: salt.state - Result: Clean
    > Started: - 18:44:57.762346 Duration: 1553.957 ms Name: mds restart noop -
    > Function: test.nop - Result: Clean Started: - 18:44:59.316442 Duration:
    > 0.348 ms 
    >     Summary for salt_master
    >     -------------
    >     Succeeded: 17 (changed=8)
    >     Failed:     0
    >     -------------
    >     Total states run:     17
    >     Total run time:   38.874 s
    > 
    > 
    > Before running Stage 1, the /srv/pillar/ceph/proposals directory does not
    > exist.
     salt:~ # ls /srv/pillar/ceph/proposals/
    >     ls: cannot access '/srv/pillar/ceph/proposals/': No such file or
    > directory
     
    > That?s where I?m at ? Googling..
    > 
    > ~ Kevin
    > 
    > From: <deepsea-users-bounces at lists.suse.com> on behalf of Joel Zhou
    > <joel.zhou at suse.com>
     Reply-To: Discussions about the DeepSea management
    > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday, September
    > 18, 2018 at 11:34 PM
    > To: Discussions about the DeepSea management framework for Ceph
    > <deepsea-users at lists.suse.com>
     Subject: Re: [Deepsea-users] stage 1 errors
    > on Azure
    > 
    > Hi Kevin,
    > 
    > My short answer is,
    > 
    > Step 1, before stage 0, check your salt-api service on salt-master node
    > first.
     ```bash
    > zypper install -y salt-api
    > systemctl enable salt-api.service
    > systemctl start salt-api.service
    > ```
    > Step 2, make sure NTP service works correctly on all nodes, which means time
    > synchronized correctly on all nodes.
     Step 3, reboot all your nodes, if
    > acceptable. In case of kernel updated somehow. Step 4, then you have to
    > start over again from stage 0 to 5.
    > 
    > Basically, deepsea is a bunch of salt scripts, and salt based on python2
    > and/or python3.
     I have no clues about your whole running stack, so assume
    > SLES 12 sp3 + SES 5, which works fine and supported. More info would be
    > helpful, and also your purpose, such as for practice on your own, or for
    > PoC/testing to meet customer?s demands. 
    > Regards,
    > 
    > --
    > Joel Zhou ???
    > Senior Storage Technologist, APJ
    > 
    > Mobile: +86 18514577601
    > Email: joel.zhou at suse.com
    > 
    > From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres
    > <kevin.ayres at suse.com>
     Reply-To: Discussions about the DeepSea management
    > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday, September
    > 18, 2018 at 4:49 PM
    > To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
    > Subject: [Deepsea-users] stage 1 errors on Azure
    > 
    > Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully.
    > Same output with deepsea command. The master and minion service are running
    > and bidirectional host resolution are good. Keys are all accepted. From
    > what I can determine, the default files are not created by stage 0 for some
    > reason. Thoughts? What I?m seeing is that it fails to create the
    > /srv/pillar/ceph/proposals
     
    > I?m running through this doc line by line:
    > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/boo
    > k_storage_deployment/book_storage_deployment.html#deepsea.cli
     
    > ~ Kevin
    > 
    > 
    > salt:~ # salt-run state.orch ceph.stage.discovery
    > 
    > salt-api                 : ["Salt API is failing to authenticate - try
    > 'systemctl restart salt-master': list index out of range"]
     
    > deepsea_minions          : valid
    > 
    > master_minion            : valid
    > 
    > ceph_version             : valid
    > 
    > [ERROR   ] No highstate or sls specified, no execution made
    > 
    > salt_master:
    > 
    > ----------
    > 
    >           ID: salt-api failed
    > 
    >     Function: salt.state
    > 
    >         Name: just.exit
    > 
    >       Result: False
    > 
    >      Comment: No highstate or sls specified, no execution made
    > 
    >      Started: 22:30:53.628882
    > 
    >     Duration: 0.647 ms
    > 
    >      Changes:
    > 
    > 
    > 
    > Summary for salt_master
    > 
    > ------------
    > 
    > Succeeded: 0
    > 
    > Failed:    1
    > 
    > ------------
    > 
    > Total states run:     1
    > 
    > Total run time:   0.647 ms
    > 
    > salt:~ # !tail
    > tail -f /var/log/salt/master
    > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499]
    > role-igw/cluster/igw*.sls matched no files
     2018-09-18 22:29:08,797
    > [salt.loaded.ext.runners.validate][WARNING ][8499]
    > role-openattic/cluster/salt.sls matched no files 2018-09-18 22:29:08,797
    > [salt.loaded.ext.runners.validate][WARNING ][8499]
    > config/stack/default/global.yml matched no files 2018-09-18 22:29:08,798
    > [salt.loaded.ext.runners.validate][WARNING ][8499]
    > config/stack/default/ceph/cluster.yml matched no files 2018-09-18
    > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499]
    > cluster/*.sls matched no files 2018-09-18 22:29:08,798
    > [salt.loaded.ext.runners.validate][WARNING ][8499]
    > stack/default/ceph/minions/*.yml matched no files 2018-09-18 22:29:08,822
    > [salt.state       ][ERROR   ][8499] No highstate or sls specified, no
    > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR   ][5672]
    > Exception occurred while handling stream: [Errno 0] Success 2018-09-18
    > 22:29:56,797 [salt.state       ][ERROR   ][8759] No highstate or sls
    > specified, no execution made 2018-09-18 22:30:53,629 [salt.state      
    > ][ERROR   ][9272] No highstate or sls specified, no execution made 
    > 
    > There?s also some issue with the salt-minion.service:
    > ? salt-minion.service - The Salt Minion
    >    Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled;
    > vendor preset: disabled)
     Active: active (running) since Tue 2018-09-18
    > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion)
    > ?
    > .....
    > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.
    > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR   ] Function cephimages.list
    > in mine_functions failed to execute
     
    > 
    
    
From ejackson at suse.com  Wed Sep 19 14:39:28 2018
From: ejackson at suse.com (Eric Jackson)
Date: Wed, 19 Sep 2018 16:39:28 -0400
Subject: [Deepsea-users] stage 1 errors on Azure
In-Reply-To: <33994725-E346-4599-842A-042E5DBFA138@suse.com>
References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>
	<1624041.v2sRp242nD@fury.home>
	<33994725-E346-4599-842A-042E5DBFA138@suse.com>
Message-ID: <7625052.XIqpMaYcJK@fury.home>

The rpm would be installed after Salt is configured.  I understand some installations 
install both Salt and DeepSea via YaST.  We did try to make accommodations for that 
scenario.

So, your salt-api is still down.  We ran into drastically different reasons why  the salt-api 
can fail.  We did not have the bandwidth to address each type of failure.  The check we do 
is a curl command to verify that the salt-api is answering.   

curl -si localhost:8000/login -H "Accept: application/json"
-d username=admin -d sharedsecret=xxx -d eauth=sharedsecret

The sharedsecret is in /etc/salt/master/sharedsecret.conf.  It's possible to have the Salt 
master remember the previous contents of that file and the Salt api to use the current 
contents if an admin does things just right :) .  That's why we give the error message 
about restarting the Salt master.

However, if the above curl command fails because the Salt-api is down or maybe 
localhost is not defined in /etc/hosts (ran into that once).   The curl command may shed 
more light on the failure.

***
As far as how would you have found that curl command.  Take a look at the contents of /
srv/modules/runners/validate.py.  About line 626, you will see the python code is literally 
calling the curl command.  In other words, do not be intimidated about the python code.  
Much of it is calling some of the same command line tools that you would use.  We are 
just using Salt to do much of this in parallel.

It's also possible to run this directly without invoking Stage 1.  

# salt-run validate.saltapi

Although the validations can be frustrating, not having them is worse.  The situations 
where we did not check for Salt api lead to incredibly painful debug sessions.  

Eric 

On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote:
> Thanks Eric, Yes, I understand this but worded it poorly. I don't see any
> issues with NTP or DNS. Something else is amiss. 
 Should deepsea be
> installed after salt as outlined in the deployment doc, or before? 
> salt:~ # salt-run state.orch ceph.stage.discovery
> salt-api                 : ["Salt API is failing to authenticate - try
> 'systemctl restart salt-master': list index out of range"]
 deepsea_minions
>          : valid
> master_minion            : valid
> ceph_version             : valid
> [ERROR   ] No highstate or sls specified, no execution made
> salt_master:
> ----------
>           ID: salt-api failed
>     Function: salt.state
>         Name: just.exit
>       Result: False
>      Comment: No highstate or sls specified, no execution made
>      Started: 19:38:41.962044
>     Duration: 0.734 ms
>      Changes:   
> 
> Summary for salt_master
> ------------
> Succeeded: 0
> Failed:    1
> ------------
> Total states run:     1
> Total run time:   0.734 ms
> 
> 
> salt:~ # tail -f /var/log/salt/master
> 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319]
> All minions are ready
 2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR  
> ][1626] Exception occurred while handling stream: [Errno 0] Success
> 2018-09-19 19:38:41,962 [salt.state       ][ERROR   ][40826] No highstate
> or sls specified, no execution made 
> salt:~ # ls /srv/pillar/ceph/proposals
> ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory
> 
> salt:~ # ls /srv/pillar/ceph/
> benchmarks  deepsea_minions.sls  deepsea_minions.sls.rpmsave 
> init.sls	master_minion.sls  master_minion.sls.rpmsave  stack
 
> 
> ~ Kevin
> 
> ?On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/09b42750/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/09b42750/attachment.sig>

From kevin.ayres at suse.com  Wed Sep 19 16:03:05 2018
From: kevin.ayres at suse.com (Kevin Ayres)
Date: Wed, 19 Sep 2018 22:03:05 +0000
Subject: [Deepsea-users] stage 1 errors on Azure
In-Reply-To: <7625052.XIqpMaYcJK@fury.home>
References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>
	<1624041.v2sRp242nD@fury.home>
	<33994725-E346-4599-842A-042E5DBFA138@suse.com>
	<7625052.XIqpMaYcJK@fury.home>
Message-ID: <E40F8A03-D9FD-464A-8BA1-172C2F4501D9@suse.com>

Thanks Eric! I understand. Couldn?t find localhost ? DOH moment. Salt-master restarted fine each time without errors but api was failing.


salt:~ # salt-run validate.saltapi

salt-api                 : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"]

False

Localhost was missing due to the heavy /etc/hosts modifications I made for Azure instance resolution.

I just appended localhost into ?127.0.0.1       salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt?

salt:~ # salt-run validate.saltapi
salt-api                 : valid

salt:~ # curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=fbc39dd2-2bba-42ec-ab9b-7d9e71b84047 -d eauth=sharedsecret
HTTP/1.1 200 OK
Content-Length: 204
Access-Control-Expose-Headers: GET, POST
Vary: Accept-Encoding
Server: CherryPy/3.6.0
Allow: GET, HEAD, POST
Access-Control-Allow-Credentials: true
Date: Wed, 19 Sep 2018 21:39:28 GMT
Access-Control-Allow-Origin: *
X-Auth-Token: 640bb306bd8fb202ef71757aac83f0db9beb4e11
Content-Type: application/json
Set-Cookie: session_id=640bb306bd8fb202ef71757aac83f0db9beb4e11; expires=Thu, 20 Sep 2018 07:39:28 GMT; Path=/

{"return": [{"perms": [".*", "@runner", "@wheel"], "start": 1537393168.443645, "token": "640bb306bd8fb202ef71757aac83f0db9beb4e11", "expire": 1537436368.443646, "user": "admin", "eauth": "sharedsecret"}]}salt:~ #

Now Stage 1 runs through.
salt:~ # salt-run state.orch ceph.stage.discovery
salt-api                 : valid
deepsea_minions          : valid
master_minion            : valid
ceph_version             : valid
[WARNING ] All minions are ready
{}
salt_master:
  Name: minions.ready - Function: salt.runner - Result: Changed Started: - 21:59:25.186357 Duration: 1528.309 ms
  Name: refresh_pillar0 - Function: salt.state - Result: Changed Started: - 21:59:26.714801 Duration: 340.017 ms
  Name: populate.proposals - Function: salt.runner - Result: Changed Started: - 21:59:27.055255 Duration: 5107.852 ms
  Name: proposal.populate - Function: salt.runner - Result: Changed Started: - 21:59:32.163281 Duration: 2578.835 ms

Summary for salt_master
------------
Succeeded: 4 (changed=4)
Failed:    0
------------
Total states run:     4
Total run time:   9.555 s

And Proposals exist.

salt:~ # ls /srv/pillar/ceph/proposals/

cluster-ceph      config        role-admin            role-client-cephfs  role-client-nfs         role-ganesha       role-master  role-mgr  role-openattic

cluster-unassigned  profile-default  role-benchmark-rbd  role-client-iscsi   role-client-radosgw  role-igw   role-mds     role-mon  role-rgw

Thanks again Eric!

~ Kevin

From: <deepsea-users-bounces at lists.suse.com> on behalf of Eric Jackson <ejackson at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Wednesday, September 19, 2018 at 1:39 PM
To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] stage 1 errors on Azure


The rpm would be installed after Salt is configured. I understand some installations install both Salt and DeepSea via YaST. We did try to make accommodations for that scenario.


So, your salt-api is still down. We ran into drastically different reasons why the salt-api can fail. We did not have the bandwidth to address each type of failure. The check we do is a curl command to verify that the salt-api is answering.


curl -si localhost:8000/login -H "Accept: application/json"
  -d username=admin -d sharedsecret=xxx -d eauth=sharedsecret


The sharedsecret is in /etc/salt/master/sharedsecret.conf. It's possible to have the Salt master remember the previous contents of that file and the Salt api to use the current contents if an admin does things just right :) . That's why we give the error message about restarting the Salt master.


However, if the above curl command fails because the Salt-api is down or maybe localhost is not defined in /etc/hosts (ran into that once). The curl command may shed more light on the failure.


***

As far as how would you have found that curl command. Take a look at the contents of /srv/modules/runners/validate.py. About line 626, you will see the python code is literally calling the curl command. In other words, do not be intimidated about the python code. Much of it is calling some of the same command line tools that you would use. We are just using Salt to do much of this in parallel.


It's also possible to run this directly without invoking Stage 1.


# salt-run validate.saltapi


Although the validations can be frustrating, not having them is worse. The situations where we did not check for Salt api lead to incredibly painful debug sessions.


Eric


On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote:

> Thanks Eric, Yes, I understand this but worded it poorly. I don't see any

> issues with NTP or DNS. Something else is amiss.

Should deepsea be

> installed after salt as outlined in the deployment doc, or before?

> salt:~ # salt-run state.orch ceph.stage.discovery

> salt-api : ["Salt API is failing to authenticate - try

> 'systemctl restart salt-master': list index out of range"]

deepsea_minions

> : valid

> master_minion : valid

> ceph_version : valid

> [ERROR ] No highstate or sls specified, no execution made

> salt_master:

> ----------

> ID: salt-api failed

> Function: salt.state

> Name: just.exit

> Result: False

> Comment: No highstate or sls specified, no execution made

> Started: 19:38:41.962044

> Duration: 0.734 ms

> Changes:

>

> Summary for salt_master

> ------------

> Succeeded: 0

> Failed: 1

> ------------

> Total states run: 1

> Total run time: 0.734 ms

>

>

> salt:~ # tail -f /var/log/salt/master

> 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319]

> All minions are ready

2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR

> ][1626] Exception occurred while handling stream: [Errno 0] Success

> 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate

> or sls specified, no execution made

> salt:~ # ls /srv/pillar/ceph/proposals

> ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory

>

> salt:~ # ls /srv/pillar/ceph/

> benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave

> init.sls master_minion.sls master_minion.sls.rpmsave stack

>

> ~ Kevin

>

> On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of

> Eric Jackson" <deepsea-users-bounces at lists.suse.com on behalf of

> ejackson at suse.com> wrote:

> Hi Kevin,

> Stage 0 only does the "preparation" part. That is, sync'ing salt

> modules,

zypper updates, etc. Stage 1 is the "discovery" part that

> interrogates the minions and then creates the roles and storage fragments.

> If your salt-api issue is resolved, Stage 1 should run relatively quick.

>

> Eric

>

> On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote:

>

> > Thanks Joel, yes DNS, NTP is configured and behaving correctly.

> > SP3/SES5

> > from current repo. salt-api service, master, minion service running

> > (with

> > one error.)

>

> I?m walking through the Deployment guide line by line with

>

> > same result, now on my second freshly built master node. Salt output

> > is at

> > the bottom of this message. Key: After stage 0, the */proposals

> > directory

> > has NOT been created.

> > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24):

> > Root ssh enabled and key based login from master to all nodes as root.

> > All

> > nodes rebooted before salt stage.

>

> All nodes using identical image and

>

> > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. -

> > the

> > Azure instance defaults.

> > Salt (and all nodes):~ # zypper lr -E

> > Repository priorities are without effect. All enabled repositories

> > share the

same priority.

>

> # | Alias

>

> > | Name | Enabled | GPG Check

> > | |

> >

> > Refresh

> > ---+------------------------------------------------------------------

> > --+--

> > ---------------------------------+---------+-----------+-------- 3 |

> > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool |

> > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 |

> > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates |

> > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 |

> > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool |

> > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 |

> > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates |

> > SLES12-SP3-Updates | Yes | (r ) Yes | Yes

> > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but

> > I?ve also

updated hosts files.

>

> salt:~ # hostname

>

> > salt

> > salt:~ # ping salt

> > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > (127.0.0.1)

> > 56(84) bytes of data.

>

> 64 bytes from

>

> > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1):

> > icmp_seq=1 ttl=64 time=0.030 ms

> > 104.211.27.224 Outside NAT to 172.19.20.10

> > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt

>

> 172.19.20.12

>

> > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1

> >

> > 172.19.20.13

> > mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > mon2 172.19.20.14

> > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3

> > 172.19.20.15

>

> > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1

> >

> > 172.19.20.16

> > osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > osd2 172.19.20.17

> > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3

> > 172.19.20.18

>

> > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1

> >

> > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in

> > same

> > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown:

>

> bash-3.2$ pssh -h

>

> > pssh-hosts -l sesuser -i sudo ntpq -p

> > [1] 11:15:27 [SUCCESS] mon3

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 19 64 1

> > 15.596 -4.863 0.333 [2] 11:15:27 [SUCCESS] salt

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 42 64 1

> > 17.063 -6.702 0.000 [3] 11:15:27 [SUCCESS] igw1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 18 64 1

> > 17.394 -27.874 7.663 [4] 11:15:27 [SUCCESS] osd1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 21 64 1

> > 16.962 -3.755 0.813 [5] 11:15:27 [SUCCESS] osd2

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 22 64 1

> > 15.832 -4.709 3.062 [6] 11:15:27 [SUCCESS] osd3

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 26 64 1

> > 15.877 -3.252 19.131 [7] 11:15:27 [SUCCESS] mon1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 2 64 1

> > 16.120 -4.263 0.000 [8] 11:15:27 [SUCCESS] mon2

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 2 64 1

> > 16.108 -7.713 0.959

> >

> > **SALT**

> > salt:~ # systemctl status salt-api salt-master salt-minion |grep

> > 'active

> > (running)'

>

> Active: active (running) since Wed 2018-09-19 18:14:39 UTC;

>

> > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC;

> > 13min

ago Active: active (running) since Wed 2018-09-19 18:14:41

> > UTC; 13min ago salt:~ # systemctl status salt-api salt-master

> > salt-minion |grep ERROR >

> > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function

> >

> > cephimages.list in mine_functions failed to execute

>

>

>

> > salt:~ # salt-key --list-all

> >

> > Accepted Keys:

> > igw1

> > mon1

> > mon2

> > mon3

> > osd1

> > osd2

> > osd3

> > salt

> > Denied Keys:

> > Unaccepted Keys:

> > Rejected Keys:

> >

> >

> > salt:~ # salt '*' test.ping

> > salt:

> >

> > True

> >

> > osd2:

> >

> > True

> >

> > mon3:

> >

> > True

> >

> > osd3:

> >

> > True

> >

> > osd1:

> >

> > True

> >

> > mon2:

> >

> > True

> >

> > igw1:

> >

> > True

> >

> > mon1:

> >

> > True

> >

> >

> > salt:~ # cat /srv/pillar/ceph/master_minion.sls

> > master_minion: salt

> >

> > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls

> > ...

> > # Choose all minions

> > deepsea_minions: '*'

> > ...

> >

> > **SALT STAGES**

> > Stage 0 is successful with no errors but does not create the

> > proposals

> > folder.

>

>

>

> > salt:~ # salt-run state.orch ceph.stage.prep

> >

> > deepsea_minions : valid

> > master_minion : valid

> > ceph_version : valid

> > [WARNING ] All minions are ready

> > salt_master:

> >

> > Name: sync master - Function: salt.state - Result: Changed

> > Started: -

> >

> > 18:44:20.440255 Duration: 949.98 ms

>

> Name: salt-api - Function: salt.state

>

> > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms

> > Name:

> > repo master - Function: salt.state - Result: Clean Started: -

> > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master -

> > Function:

> > salt.state - Result: Clean Started: - 18:44:24.998333 Duration:

> > 1127.063 ms

Name: prepare master - Function: salt.state - Result:

> > Changed Started: - 18:44:26.125514 Duration: 4109.917 ms Name:

> > filequeue.remove - Function: salt.runner - Result: Changed Started: -

> > 18:44:30.235610 Duration: 2071.199 ms Name: restart master -

> > Function: salt.state - Result: Clean Started: - 18:44:32.306972

> > Duration: 1006.268 ms Name: filequeue.add - Function: salt.runner -

> > Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms Name:

> > minions.ready - Function: salt.runner - Result: Changed Started: -

> > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function:

> > salt.state - Result: Clean Started: - 18:44:36.558363 Duration:

> > 553.342 ms Name: metapackage minions - Function: salt.state - Result:

> > Clean Started: - 18:44:37.111825 Duration: 3993.733 ms Name: common

> > packages - Function: salt.state - Result: Clean Started: -

> > 18:44:41.105706 Duration: 2434.079 ms Name: sync - Function:

> > salt.state - Result: Changed Started: -

> > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function:

> > salt.state -

> > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name:

> > updates - Function: salt.state - Result: Changed Started: -

> > 18:44:46.578853

Duration: 11183.347 ms Name: restart - Function:

> > salt.state - Result: Clean Started: - 18:44:57.762346 Duration:

> > 1553.957 ms Name: mds restart noop - Function: test.nop - Result:

> > Clean Started: - 18:44:59.316442 Duration: 0.348 ms

> >

> > Summary for salt_master

> > -------------

> > Succeeded: 17 (changed=8)

> > Failed: 0

> > -------------

> > Total states run: 17

> > Total run time: 38.874 s

> >

> >

> >

> > Before running Stage 1, the /srv/pillar/ceph/proposals directory does

> > not

> > exist.

>

> salt:~ # ls /srv/pillar/ceph/proposals/

>

> > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or

> >

> > directory

>

>

>

> > That?s where I?m at ? Googling..

> >

> > ~ Kevin

> >

> > From: <deepsea-users-bounces at lists.suse.com> on behalf of Joel Zhou

> > <joel.zhou at suse.com>

>

> Reply-To: Discussions about the DeepSea management

>

> > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday,

> > September

> > 18, 2018 at 11:34 PM

> > To: Discussions about the DeepSea management framework for Ceph

> > <deepsea-users at lists.suse.com>

>

> Subject: Re: [Deepsea-users] stage 1 errors

>

> > on Azure

> >

> > Hi Kevin,

> >

> > My short answer is,

> >

> > Step 1, before stage 0, check your salt-api service on salt-master

> > node

> > first.

>

> ```bash

>

> > zypper install -y salt-api

> > systemctl enable salt-api.service

> > systemctl start salt-api.service

> > ```

> > Step 2, make sure NTP service works correctly on all nodes, which

> > means time

synchronized correctly on all nodes.

>

> Step 3, reboot all your nodes, if

>

> > acceptable. In case of kernel updated somehow. Step 4, then you have

> > to

> > start over again from stage 0 to 5.

> >

> > Basically, deepsea is a bunch of salt scripts, and salt based on

> > python2

> > and/or python3.

>

> I have no clues about your whole running stack, so assume

>

> > SLES 12 sp3 + SES 5, which works fine and supported. More info would

> > be

> > helpful, and also your purpose, such as for practice on your own, or

> > for

> > PoC/testing to meet customer?s demands.

> > Regards,

> >

> > --

> > Joel Zhou ???

> > Senior Storage Technologist, APJ

> >

> > Mobile: +86 18514577601

> > Email: joel.zhou at suse.com

> >

> > From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres

> > <kevin.ayres at suse.com>

>

> Reply-To: Discussions about the DeepSea management

>

> > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday,

> > September

> > 18, 2018 at 4:49 PM

> > To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>

> > Subject: [Deepsea-users] stage 1 errors on Azure

> >

> > Hey guys, I can?t seem to get past stage 1. Stage 0 complete

> > successfully.

> > Same output with deepsea command. The master and minion service are

> > running

and bidirectional host resolution are good. Keys are all

> > accepted. From what I can determine, the default files are not

> > created by stage 0 for some reason. Thoughts? What I?m seeing is that

> > it fails to create the

> > /srv/pillar/ceph/proposals

>

>

>

> > I?m running through this doc line by line:

> > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtm

> > l/boo

k_storage_deployment/book_storage_deployment.html#deepsea.cli

>

>

>

> > ~ Kevin

> >

> >

> > salt:~ # salt-run state.orch ceph.stage.discovery

> >

> > salt-api : ["Salt API is failing to authenticate -

> > try

> > 'systemctl restart salt-master': list index out of range"]

>

>

>

> > deepsea_minions : valid

> >

> > master_minion : valid

> >

> > ceph_version : valid

> >

> > [ERROR ] No highstate or sls specified, no execution made

> >

> > salt_master:

> >

> > ----------

> >

> >

> > ID: salt-api failed

> >

> >

> >

> > Function: salt.state

> >

> >

> >

> > Name: just.exit

> >

> >

> >

> > Result: False

> >

> >

> >

> > Comment: No highstate or sls specified, no execution made

> >

> >

> >

> > Started: 22:30:53.628882

> >

> >

> >

> > Duration: 0.647 ms

> >

> >

> >

> > Changes:

> >

> >

> >

> >

> > Summary for salt_master

> >

> > ------------

> >

> > Succeeded: 0

> >

> > Failed: 1

> >

> > ------------

> >

> > Total states run: 1

> >

> > Total run time: 0.647 ms

> >

> > salt:~ # !tail

> > tail -f /var/log/salt/master

> > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING

> > ][8499]

> > role-igw/cluster/igw*.sls matched no files

>

> 2018-09-18 22:29:08,797

>

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > role-openattic/cluster/salt.sls matched no files 2018-09-18

> > 22:29:08,797

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > config/stack/default/global.yml matched no files 2018-09-18

> > 22:29:08,798

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > config/stack/default/ceph/cluster.yml matched no files 2018-09-18

> > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499]

> > cluster/*.sls matched no files 2018-09-18 22:29:08,798

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > stack/default/ceph/minions/*.yml matched no files 2018-09-18

> > 22:29:08,822

> > [salt.state ][ERROR ][8499] No highstate or sls specified, no

> > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR

> > ][5672]

Exception occurred while handling stream: [Errno 0] Success

> > 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No

> > highstate or sls specified, no execution made 2018-09-18 22:30:53,629

> > [salt.state ][ERROR ][9272] No highstate or sls specified, no

> > execution made

> > There?s also some issue with the salt-minion.service:

> > ? salt-minion.service - The Salt Minion

> >

> > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;

> > enabled;

> >

> > vendor preset: disabled)

>

> Active: active (running) since Tue 2018-09-18

>

> > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion)

> > ?

> > .....

> > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.

> > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function

> > cephimages.list

in mine_functions failed to execute

>

>

>

> >

>

>

>

>

> _______________________________________________

> Deepsea-users mailing list

> Deepsea-users at lists.suse.com

> http://lists.suse.com/mailman/listinfo/deepsea-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/98692cda/attachment.htm>

From joel.zhou at suse.com  Wed Sep 19 16:46:59 2018
From: joel.zhou at suse.com (Joel Zhou)
Date: Wed, 19 Sep 2018 22:46:59 +0000
Subject: [Deepsea-users] stage 1 errors on Azure
In-Reply-To: <E40F8A03-D9FD-464A-8BA1-172C2F4501D9@suse.com>
References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>
	<1624041.v2sRp242nD@fury.home>
	<33994725-E346-4599-842A-042E5DBFA138@suse.com>
	<7625052.XIqpMaYcJK@fury.home>
	<E40F8A03-D9FD-464A-8BA1-172C2F4501D9@suse.com>
Message-ID: <A8F58F74-4F9C-435C-AAD0-3E866FABF61D@suse.com>

Kevin,

In another word, your DNS is configured and behaving NOT correctly.

Personally, I?d like to append every single record to the host file, including salt-master, even without localhost record.

Anyway, you are good to go.

Regards,

--
Joel Zhou ???
Senior Storage Technologist, APJ

Mobile: +86 18514577601
Email: joel.zhou at suse.com

From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres <kevin.ayres at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Wednesday, September 19, 2018 at 4:03 PM
To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] stage 1 errors on Azure

Thanks Eric! I understand. Couldn?t find localhost ? DOH moment. Salt-master restarted fine each time without errors but api was failing.


salt:~ # salt-run validate.saltapi

salt-api                 : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"]

False

Localhost was missing due to the heavy /etc/hosts modifications I made for Azure instance resolution.

I just appended localhost into ?127.0.0.1       salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt?

salt:~ # salt-run validate.saltapi
salt-api                 : valid

salt:~ # curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=fbc39dd2-2bba-42ec-ab9b-7d9e71b84047 -d eauth=sharedsecret
HTTP/1.1 200 OK
Content-Length: 204
Access-Control-Expose-Headers: GET, POST
Vary: Accept-Encoding
Server: CherryPy/3.6.0
Allow: GET, HEAD, POST
Access-Control-Allow-Credentials: true
Date: Wed, 19 Sep 2018 21:39:28 GMT
Access-Control-Allow-Origin: *
X-Auth-Token: 640bb306bd8fb202ef71757aac83f0db9beb4e11
Content-Type: application/json
Set-Cookie: session_id=640bb306bd8fb202ef71757aac83f0db9beb4e11; expires=Thu, 20 Sep 2018 07:39:28 GMT; Path=/

{"return": [{"perms": [".*", "@runner", "@wheel"], "start": 1537393168.443645, "token": "640bb306bd8fb202ef71757aac83f0db9beb4e11", "expire": 1537436368.443646, "user": "admin", "eauth": "sharedsecret"}]}salt:~ #

Now Stage 1 runs through.
salt:~ # salt-run state.orch ceph.stage.discovery
salt-api                 : valid
deepsea_minions          : valid
master_minion            : valid
ceph_version             : valid
[WARNING ] All minions are ready
{}
salt_master:
  Name: minions.ready - Function: salt.runner - Result: Changed Started: - 21:59:25.186357 Duration: 1528.309 ms
  Name: refresh_pillar0 - Function: salt.state - Result: Changed Started: - 21:59:26.714801 Duration: 340.017 ms
  Name: populate.proposals - Function: salt.runner - Result: Changed Started: - 21:59:27.055255 Duration: 5107.852 ms
  Name: proposal.populate - Function: salt.runner - Result: Changed Started: - 21:59:32.163281 Duration: 2578.835 ms

Summary for salt_master
------------
Succeeded: 4 (changed=4)
Failed:    0
------------
Total states run:     4
Total run time:   9.555 s

And Proposals exist.

salt:~ # ls /srv/pillar/ceph/proposals/

cluster-ceph      config        role-admin            role-client-cephfs  role-client-nfs         role-ganesha       role-master  role-mgr  role-openattic

cluster-unassigned  profile-default  role-benchmark-rbd  role-client-iscsi   role-client-radosgw  role-igw   role-mds     role-mon  role-rgw

Thanks again Eric!

~ Kevin

From: <deepsea-users-bounces at lists.suse.com> on behalf of Eric Jackson <ejackson at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Wednesday, September 19, 2018 at 1:39 PM
To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] stage 1 errors on Azure


The rpm would be installed after Salt is configured. I understand some installations install both Salt and DeepSea via YaST. We did try to make accommodations for that scenario.


So, your salt-api is still down. We ran into drastically different reasons why the salt-api can fail. We did not have the bandwidth to address each type of failure. The check we do is a curl command to verify that the salt-api is answering.


curl -si localhost:8000/login -H "Accept: application/json"
  -d username=admin -d sharedsecret=xxx -d eauth=sharedsecret


The sharedsecret is in /etc/salt/master/sharedsecret.conf. It's possible to have the Salt master remember the previous contents of that file and the Salt api to use the current contents if an admin does things just right :) . That's why we give the error message about restarting the Salt master.


However, if the above curl command fails because the Salt-api is down or maybe localhost is not defined in /etc/hosts (ran into that once). The curl command may shed more light on the failure.


***

As far as how would you have found that curl command. Take a look at the contents of /srv/modules/runners/validate.py. About line 626, you will see the python code is literally calling the curl command. In other words, do not be intimidated about the python code. Much of it is calling some of the same command line tools that you would use. We are just using Salt to do much of this in parallel.


It's also possible to run this directly without invoking Stage 1.


# salt-run validate.saltapi


Although the validations can be frustrating, not having them is worse. The situations where we did not check for Salt api lead to incredibly painful debug sessions.


Eric


On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote:

> Thanks Eric, Yes, I understand this but worded it poorly. I don't see any

> issues with NTP or DNS. Something else is amiss.

Should deepsea be

> installed after salt as outlined in the deployment doc, or before?

> salt:~ # salt-run state.orch ceph.stage.discovery

> salt-api : ["Salt API is failing to authenticate - try

> 'systemctl restart salt-master': list index out of range"]

deepsea_minions

> : valid

> master_minion : valid

> ceph_version : valid

> [ERROR ] No highstate or sls specified, no execution made

> salt_master:

> ----------

> ID: salt-api failed

> Function: salt.state

> Name: just.exit

> Result: False

> Comment: No highstate or sls specified, no execution made

> Started: 19:38:41.962044

> Duration: 0.734 ms

> Changes:

>

> Summary for salt_master

> ------------

> Succeeded: 0

> Failed: 1

> ------------

> Total states run: 1

> Total run time: 0.734 ms

>

>

> salt:~ # tail -f /var/log/salt/master

> 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319]

> All minions are ready

2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR

> ][1626] Exception occurred while handling stream: [Errno 0] Success

> 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate

> or sls specified, no execution made

> salt:~ # ls /srv/pillar/ceph/proposals

> ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory

>

> salt:~ # ls /srv/pillar/ceph/

> benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave

> init.sls master_minion.sls master_minion.sls.rpmsave stack

>

> ~ Kevin

>

> On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of

> Eric Jackson" <deepsea-users-bounces at lists.suse.com on behalf of

> ejackson at suse.com> wrote:

> Hi Kevin,

> Stage 0 only does the "preparation" part. That is, sync'ing salt

> modules,

zypper updates, etc. Stage 1 is the "discovery" part that

> interrogates the minions and then creates the roles and storage fragments.

> If your salt-api issue is resolved, Stage 1 should run relatively quick.

>

> Eric

>

> On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote:

>

> > Thanks Joel, yes DNS, NTP is configured and behaving correctly.

> > SP3/SES5

> > from current repo. salt-api service, master, minion service running

> > (with

> > one error.)

>

> I?m walking through the Deployment guide line by line with

>

> > same result, now on my second freshly built master node. Salt output

> > is at

> > the bottom of this message. Key: After stage 0, the */proposals

> > directory

> > has NOT been created.

> > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24):

> > Root ssh enabled and key based login from master to all nodes as root.

> > All

> > nodes rebooted before salt stage.

>

> All nodes using identical image and

>

> > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. -

> > the

> > Azure instance defaults.

> > Salt (and all nodes):~ # zypper lr -E

> > Repository priorities are without effect. All enabled repositories

> > share the

same priority.

>

> # | Alias

>

> > | Name | Enabled | GPG Check

> > | |

> >

> > Refresh

> > ---+------------------------------------------------------------------

> > --+--

> > ---------------------------------+---------+-----------+-------- 3 |

> > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool |

> > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 |

> > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates |

> > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 |

> > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool |

> > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 |

> > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates |

> > SLES12-SP3-Updates | Yes | (r ) Yes | Yes

> > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but

> > I?ve also

updated hosts files.

>

> salt:~ # hostname

>

> > salt

> > salt:~ # ping salt

> > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > (127.0.0.1)

> > 56(84) bytes of data.

>

> 64 bytes from

>

> > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1):

> > icmp_seq=1 ttl=64 time=0.030 ms

> > 104.211.27.224 Outside NAT to 172.19.20.10

> > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt

>

> 172.19.20.12

>

> > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1

> >

> > 172.19.20.13

> > mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > mon2 172.19.20.14

> > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3

> > 172.19.20.15

>

> > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1

> >

> > 172.19.20.16

> > osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > osd2 172.19.20.17

> > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3

> > 172.19.20.18

>

> > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1

> >

> > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in

> > same

> > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown:

>

> bash-3.2$ pssh -h

>

> > pssh-hosts -l sesuser -i sudo ntpq -p

> > [1] 11:15:27 [SUCCESS] mon3

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 19 64 1

> > 15.596 -4.863 0.333 [2] 11:15:27 [SUCCESS] salt

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 42 64 1

> > 17.063 -6.702 0.000 [3] 11:15:27 [SUCCESS] igw1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 18 64 1

> > 17.394 -27.874 7.663 [4] 11:15:27 [SUCCESS] osd1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 21 64 1

> > 16.962 -3.755 0.813 [5] 11:15:27 [SUCCESS] osd2

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 22 64 1

> > 15.832 -4.709 3.062 [6] 11:15:27 [SUCCESS] osd3

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 26 64 1

> > 15.877 -3.252 19.131 [7] 11:15:27 [SUCCESS] mon1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 2 64 1

> > 16.120 -4.263 0.000 [8] 11:15:27 [SUCCESS] mon2

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 2 64 1

> > 16.108 -7.713 0.959

> >

> > **SALT**

> > salt:~ # systemctl status salt-api salt-master salt-minion |grep

> > 'active

> > (running)'

>

> Active: active (running) since Wed 2018-09-19 18:14:39 UTC;

>

> > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC;

> > 13min

ago Active: active (running) since Wed 2018-09-19 18:14:41

> > UTC; 13min ago salt:~ # systemctl status salt-api salt-master

> > salt-minion |grep ERROR >

> > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function

> >

> > cephimages.list in mine_functions failed to execute

>

>

>

> > salt:~ # salt-key --list-all

> >

> > Accepted Keys:

> > igw1

> > mon1

> > mon2

> > mon3

> > osd1

> > osd2

> > osd3

> > salt

> > Denied Keys:

> > Unaccepted Keys:

> > Rejected Keys:

> >

> >

> > salt:~ # salt '*' test.ping

> > salt:

> >

> > True

> >

> > osd2:

> >

> > True

> >

> > mon3:

> >

> > True

> >

> > osd3:

> >

> > True

> >

> > osd1:

> >

> > True

> >

> > mon2:

> >

> > True

> >

> > igw1:

> >

> > True

> >

> > mon1:

> >

> > True

> >

> >

> > salt:~ # cat /srv/pillar/ceph/master_minion.sls

> > master_minion: salt

> >

> > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls

> > ...

> > # Choose all minions

> > deepsea_minions: '*'

> > ...

> >

> > **SALT STAGES**

> > Stage 0 is successful with no errors but does not create the

> > proposals

> > folder.

>

>

>

> > salt:~ # salt-run state.orch ceph.stage.prep

> >

> > deepsea_minions : valid

> > master_minion : valid

> > ceph_version : valid

> > [WARNING ] All minions are ready

> > salt_master:

> >

> > Name: sync master - Function: salt.state - Result: Changed

> > Started: -

> >

> > 18:44:20.440255 Duration: 949.98 ms

>

> Name: salt-api - Function: salt.state

>

> > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms

> > Name:

> > repo master - Function: salt.state - Result: Clean Started: -

> > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master -

> > Function:

> > salt.state - Result: Clean Started: - 18:44:24.998333 Duration:

> > 1127.063 ms

Name: prepare master - Function: salt.state - Result:

> > Changed Started: - 18:44:26.125514 Duration: 4109.917 ms Name:

> > filequeue.remove - Function: salt.runner - Result: Changed Started: -

> > 18:44:30.235610 Duration: 2071.199 ms Name: restart master -

> > Function: salt.state - Result: Clean Started: - 18:44:32.306972

> > Duration: 1006.268 ms Name: filequeue.add - Function: salt.runner -

> > Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms Name:

> > minions.ready - Function: salt.runner - Result: Changed Started: -

> > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function:

> > salt.state - Result: Clean Started: - 18:44:36.558363 Duration:

> > 553.342 ms Name: metapackage minions - Function: salt.state - Result:

> > Clean Started: - 18:44:37.111825 Duration: 3993.733 ms Name: common

> > packages - Function: salt.state - Result: Clean Started: -

> > 18:44:41.105706 Duration: 2434.079 ms Name: sync - Function:

> > salt.state - Result: Changed Started: -

> > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function:

> > salt.state -

> > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name:

> > updates - Function: salt.state - Result: Changed Started: -

> > 18:44:46.578853

Duration: 11183.347 ms Name: restart - Function:

> > salt.state - Result: Clean Started: - 18:44:57.762346 Duration:

> > 1553.957 ms Name: mds restart noop - Function: test.nop - Result:

> > Clean Started: - 18:44:59.316442 Duration: 0.348 ms

> >

> > Summary for salt_master

> > -------------

> > Succeeded: 17 (changed=8)

> > Failed: 0

> > -------------

> > Total states run: 17

> > Total run time: 38.874 s

> >

> >

> >

> > Before running Stage 1, the /srv/pillar/ceph/proposals directory does

> > not

> > exist.

>

> salt:~ # ls /srv/pillar/ceph/proposals/

>

> > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or

> >

> > directory

>

>

>

> > That?s where I?m at ? Googling..

> >

> > ~ Kevin

> >

> > From: <deepsea-users-bounces at lists.suse.com> on behalf of Joel Zhou

> > <joel.zhou at suse.com>

>

> Reply-To: Discussions about the DeepSea management

>

> > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday,

> > September

> > 18, 2018 at 11:34 PM

> > To: Discussions about the DeepSea management framework for Ceph

> > <deepsea-users at lists.suse.com>

>

> Subject: Re: [Deepsea-users] stage 1 errors

>

> > on Azure

> >

> > Hi Kevin,

> >

> > My short answer is,

> >

> > Step 1, before stage 0, check your salt-api service on salt-master

> > node

> > first.

>

> ```bash

>

> > zypper install -y salt-api

> > systemctl enable salt-api.service

> > systemctl start salt-api.service

> > ```

> > Step 2, make sure NTP service works correctly on all nodes, which

> > means time

synchronized correctly on all nodes.

>

> Step 3, reboot all your nodes, if

>

> > acceptable. In case of kernel updated somehow. Step 4, then you have

> > to

> > start over again from stage 0 to 5.

> >

> > Basically, deepsea is a bunch of salt scripts, and salt based on

> > python2

> > and/or python3.

>

> I have no clues about your whole running stack, so assume

>

> > SLES 12 sp3 + SES 5, which works fine and supported. More info would

> > be

> > helpful, and also your purpose, such as for practice on your own, or

> > for

> > PoC/testing to meet customer?s demands.

> > Regards,

> >

> > --

> > Joel Zhou ???

> > Senior Storage Technologist, APJ

> >

> > Mobile: +86 18514577601

> > Email: joel.zhou at suse.com

> >

> > From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres

> > <kevin.ayres at suse.com>

>

> Reply-To: Discussions about the DeepSea management

>

> > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday,

> > September

> > 18, 2018 at 4:49 PM

> > To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>

> > Subject: [Deepsea-users] stage 1 errors on Azure

> >

> > Hey guys, I can?t seem to get past stage 1. Stage 0 complete

> > successfully.

> > Same output with deepsea command. The master and minion service are

> > running

and bidirectional host resolution are good. Keys are all

> > accepted. From what I can determine, the default files are not

> > created by stage 0 for some reason. Thoughts? What I?m seeing is that

> > it fails to create the

> > /srv/pillar/ceph/proposals

>

>

>

> > I?m running through this doc line by line:

> > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtm

> > l/boo

k_storage_deployment/book_storage_deployment.html#deepsea.cli

>

>

>

> > ~ Kevin

> >

> >

> > salt:~ # salt-run state.orch ceph.stage.discovery

> >

> > salt-api : ["Salt API is failing to authenticate -

> > try

> > 'systemctl restart salt-master': list index out of range"]

>

>

>

> > deepsea_minions : valid

> >

> > master_minion : valid

> >

> > ceph_version : valid

> >

> > [ERROR ] No highstate or sls specified, no execution made

> >

> > salt_master:

> >

> > ----------

> >

> >

> > ID: salt-api failed

> >

> >

> >

> > Function: salt.state

> >

> >

> >

> > Name: just.exit

> >

> >

> >

> > Result: False

> >

> >

> >

> > Comment: No highstate or sls specified, no execution made

> >

> >

> >

> > Started: 22:30:53.628882

> >

> >

> >

> > Duration: 0.647 ms

> >

> >

> >

> > Changes:

> >

> >

> >

> >

> > Summary for salt_master

> >

> > ------------

> >

> > Succeeded: 0

> >

> > Failed: 1

> >

> > ------------

> >

> > Total states run: 1

> >

> > Total run time: 0.647 ms

> >

> > salt:~ # !tail

> > tail -f /var/log/salt/master

> > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING

> > ][8499]

> > role-igw/cluster/igw*.sls matched no files

>

> 2018-09-18 22:29:08,797

>

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > role-openattic/cluster/salt.sls matched no files 2018-09-18

> > 22:29:08,797

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > config/stack/default/global.yml matched no files 2018-09-18

> > 22:29:08,798

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > config/stack/default/ceph/cluster.yml matched no files 2018-09-18

> > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499]

> > cluster/*.sls matched no files 2018-09-18 22:29:08,798

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > stack/default/ceph/minions/*.yml matched no files 2018-09-18

> > 22:29:08,822

> > [salt.state ][ERROR ][8499] No highstate or sls specified, no

> > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR

> > ][5672]

Exception occurred while handling stream: [Errno 0] Success

> > 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No

> > highstate or sls specified, no execution made 2018-09-18 22:30:53,629

> > [salt.state ][ERROR ][9272] No highstate or sls specified, no

> > execution made

> > There?s also some issue with the salt-minion.service:

> > ? salt-minion.service - The Salt Minion

> >

> > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;

> > enabled;

> >

> > vendor preset: disabled)

>

> Active: active (running) since Tue 2018-09-18

>

> > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion)

> > ?

> > .....

> > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.

> > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function

> > cephimages.list

in mine_functions failed to execute

>

>

>

> >

>

>

>

>

> _______________________________________________

> Deepsea-users mailing list

> Deepsea-users at lists.suse.com

> http://lists.suse.com/mailman/listinfo/deepsea-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/5c60c1c2/attachment.htm>

From kevin.ayres at suse.com  Wed Sep 19 17:04:11 2018
From: kevin.ayres at suse.com (Kevin Ayres)
Date: Wed, 19 Sep 2018 23:04:11 +0000
Subject: [Deepsea-users] stage 1 errors on Azure
In-Reply-To: <A8F58F74-4F9C-435C-AAD0-3E866FABF61D@suse.com>
References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com>
	<1624041.v2sRp242nD@fury.home>
	<33994725-E346-4599-842A-042E5DBFA138@suse.com>
	<7625052.XIqpMaYcJK@fury.home>
	<E40F8A03-D9FD-464A-8BA1-172C2F4501D9@suse.com>
	<A8F58F74-4F9C-435C-AAD0-3E866FABF61D@suse.com>
Message-ID: <A5044305-8D13-4BD9-B801-88B16016E6BD@suse.com>

Yup indeed! Thanks Joel. I have new issues at stage 2 but will start a new thread later. UG. Azure/AWS oddities may need to be a whole topic at some point. Not that?s I?d do this on AWS except for deployment or automation testing..

~ Kevin

From: <deepsea-users-bounces at lists.suse.com> on behalf of Joel Zhou <joel.zhou at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Wednesday, September 19, 2018 at 3:47 PM
To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] stage 1 errors on Azure

Kevin,

In another word, your DNS is configured and behaving NOT correctly.

Personally, I?d like to append every single record to the host file, including salt-master, even without localhost record.

Anyway, you are good to go.

Regards,

--
Joel Zhou ???
Senior Storage Technologist, APJ

Mobile: +86 18514577601
Email: joel.zhou at suse.com

From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres <kevin.ayres at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Wednesday, September 19, 2018 at 4:03 PM
To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] stage 1 errors on Azure

Thanks Eric! I understand. Couldn?t find localhost ? DOH moment. Salt-master restarted fine each time without errors but api was failing.


salt:~ # salt-run validate.saltapi

salt-api                 : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"]

False

Localhost was missing due to the heavy /etc/hosts modifications I made for Azure instance resolution.

I just appended localhost into ?127.0.0.1       salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt?

salt:~ # salt-run validate.saltapi
salt-api                 : valid

salt:~ # curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=fbc39dd2-2bba-42ec-ab9b-7d9e71b84047 -d eauth=sharedsecret
HTTP/1.1 200 OK
Content-Length: 204
Access-Control-Expose-Headers: GET, POST
Vary: Accept-Encoding
Server: CherryPy/3.6.0
Allow: GET, HEAD, POST
Access-Control-Allow-Credentials: true
Date: Wed, 19 Sep 2018 21:39:28 GMT
Access-Control-Allow-Origin: *
X-Auth-Token: 640bb306bd8fb202ef71757aac83f0db9beb4e11
Content-Type: application/json
Set-Cookie: session_id=640bb306bd8fb202ef71757aac83f0db9beb4e11; expires=Thu, 20 Sep 2018 07:39:28 GMT; Path=/

{"return": [{"perms": [".*", "@runner", "@wheel"], "start": 1537393168.443645, "token": "640bb306bd8fb202ef71757aac83f0db9beb4e11", "expire": 1537436368.443646, "user": "admin", "eauth": "sharedsecret"}]}salt:~ #

Now Stage 1 runs through.
salt:~ # salt-run state.orch ceph.stage.discovery
salt-api                 : valid
deepsea_minions          : valid
master_minion            : valid
ceph_version             : valid
[WARNING ] All minions are ready
{}
salt_master:
  Name: minions.ready - Function: salt.runner - Result: Changed Started: - 21:59:25.186357 Duration: 1528.309 ms
  Name: refresh_pillar0 - Function: salt.state - Result: Changed Started: - 21:59:26.714801 Duration: 340.017 ms
  Name: populate.proposals - Function: salt.runner - Result: Changed Started: - 21:59:27.055255 Duration: 5107.852 ms
  Name: proposal.populate - Function: salt.runner - Result: Changed Started: - 21:59:32.163281 Duration: 2578.835 ms

Summary for salt_master
------------
Succeeded: 4 (changed=4)
Failed:    0
------------
Total states run:     4
Total run time:   9.555 s

And Proposals exist.

salt:~ # ls /srv/pillar/ceph/proposals/

cluster-ceph      config        role-admin            role-client-cephfs  role-client-nfs         role-ganesha       role-master  role-mgr  role-openattic

cluster-unassigned  profile-default  role-benchmark-rbd  role-client-iscsi   role-client-radosgw  role-igw   role-mds     role-mon  role-rgw

Thanks again Eric!

~ Kevin

From: <deepsea-users-bounces at lists.suse.com> on behalf of Eric Jackson <ejackson at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Wednesday, September 19, 2018 at 1:39 PM
To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] stage 1 errors on Azure


The rpm would be installed after Salt is configured. I understand some installations install both Salt and DeepSea via YaST. We did try to make accommodations for that scenario.


So, your salt-api is still down. We ran into drastically different reasons why the salt-api can fail. We did not have the bandwidth to address each type of failure. The check we do is a curl command to verify that the salt-api is answering.


curl -si localhost:8000/login -H "Accept: application/json"
  -d username=admin -d sharedsecret=xxx -d eauth=sharedsecret


The sharedsecret is in /etc/salt/master/sharedsecret.conf. It's possible to have the Salt master remember the previous contents of that file and the Salt api to use the current contents if an admin does things just right :) . That's why we give the error message about restarting the Salt master.


However, if the above curl command fails because the Salt-api is down or maybe localhost is not defined in /etc/hosts (ran into that once). The curl command may shed more light on the failure.


***

As far as how would you have found that curl command. Take a look at the contents of /srv/modules/runners/validate.py. About line 626, you will see the python code is literally calling the curl command. In other words, do not be intimidated about the python code. Much of it is calling some of the same command line tools that you would use. We are just using Salt to do much of this in parallel.


It's also possible to run this directly without invoking Stage 1.


# salt-run validate.saltapi


Although the validations can be frustrating, not having them is worse. The situations where we did not check for Salt api lead to incredibly painful debug sessions.


Eric


On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote:

> Thanks Eric, Yes, I understand this but worded it poorly. I don't see any

> issues with NTP or DNS. Something else is amiss.

Should deepsea be

> installed after salt as outlined in the deployment doc, or before?

> salt:~ # salt-run state.orch ceph.stage.discovery

> salt-api : ["Salt API is failing to authenticate - try

> 'systemctl restart salt-master': list index out of range"]

deepsea_minions

> : valid

> master_minion : valid

> ceph_version : valid

> [ERROR ] No highstate or sls specified, no execution made

> salt_master:

> ----------

> ID: salt-api failed

> Function: salt.state

> Name: just.exit

> Result: False

> Comment: No highstate or sls specified, no execution made

> Started: 19:38:41.962044

> Duration: 0.734 ms

> Changes:

>

> Summary for salt_master

> ------------

> Succeeded: 0

> Failed: 1

> ------------

> Total states run: 1

> Total run time: 0.734 ms

>

>

> salt:~ # tail -f /var/log/salt/master

> 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319]

> All minions are ready

2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR

> ][1626] Exception occurred while handling stream: [Errno 0] Success

> 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate

> or sls specified, no execution made

> salt:~ # ls /srv/pillar/ceph/proposals

> ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory

>

> salt:~ # ls /srv/pillar/ceph/

> benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave

> init.sls master_minion.sls master_minion.sls.rpmsave stack

>

> ~ Kevin

>

> On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of

> Eric Jackson" <deepsea-users-bounces at lists.suse.com on behalf of

> ejackson at suse.com> wrote:

> Hi Kevin,

> Stage 0 only does the "preparation" part. That is, sync'ing salt

> modules,

zypper updates, etc. Stage 1 is the "discovery" part that

> interrogates the minions and then creates the roles and storage fragments.

> If your salt-api issue is resolved, Stage 1 should run relatively quick.

>

> Eric

>

> On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote:

>

> > Thanks Joel, yes DNS, NTP is configured and behaving correctly.

> > SP3/SES5

> > from current repo. salt-api service, master, minion service running

> > (with

> > one error.)

>

> I?m walking through the Deployment guide line by line with

>

> > same result, now on my second freshly built master node. Salt output

> > is at

> > the bottom of this message. Key: After stage 0, the */proposals

> > directory

> > has NOT been created.

> > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24):

> > Root ssh enabled and key based login from master to all nodes as root.

> > All

> > nodes rebooted before salt stage.

>

> All nodes using identical image and

>

> > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. -

> > the

> > Azure instance defaults.

> > Salt (and all nodes):~ # zypper lr -E

> > Repository priorities are without effect. All enabled repositories

> > share the

same priority.

>

> # | Alias

>

> > | Name | Enabled | GPG Check

> > | |

> >

> > Refresh

> > ---+------------------------------------------------------------------

> > --+--

> > ---------------------------------+---------+-----------+-------- 3 |

> > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool |

> > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 |

> > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates |

> > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 |

> > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool |

> > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 |

> > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates |

> > SLES12-SP3-Updates | Yes | (r ) Yes | Yes

> > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but

> > I?ve also

updated hosts files.

>

> salt:~ # hostname

>

> > salt

> > salt:~ # ping salt

> > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > (127.0.0.1)

> > 56(84) bytes of data.

>

> 64 bytes from

>

> > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1):

> > icmp_seq=1 ttl=64 time=0.030 ms

> > 104.211.27.224 Outside NAT to 172.19.20.10

> > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt

>

> 172.19.20.12

>

> > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1

> >

> > 172.19.20.13

> > mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > mon2 172.19.20.14

> > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3

> > 172.19.20.15

>

> > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1

> >

> > 172.19.20.16

> > osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > osd2 172.19.20.17

> > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3

> > 172.19.20.18

>

> > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1

> >

> > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in

> > same

> > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown:

>

> bash-3.2$ pssh -h

>

> > pssh-hosts -l sesuser -i sudo ntpq -p

> > [1] 11:15:27 [SUCCESS] mon3

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 19 64 1

> > 15.596 -4.863 0.333 [2] 11:15:27 [SUCCESS] salt

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 42 64 1

> > 17.063 -6.702 0.000 [3] 11:15:27 [SUCCESS] igw1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 18 64 1

> > 17.394 -27.874 7.663 [4] 11:15:27 [SUCCESS] osd1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 21 64 1

> > 16.962 -3.755 0.813 [5] 11:15:27 [SUCCESS] osd2

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 22 64 1

> > 15.832 -4.709 3.062 [6] 11:15:27 [SUCCESS] osd3

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 26 64 1

> > 15.877 -3.252 19.131 [7] 11:15:27 [SUCCESS] mon1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 2 64 1

> > 16.120 -4.263 0.000 [8] 11:15:27 [SUCCESS] mon2

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 2 64 1

> > 16.108 -7.713 0.959

> >

> > **SALT**

> > salt:~ # systemctl status salt-api salt-master salt-minion |grep

> > 'active

> > (running)'

>

> Active: active (running) since Wed 2018-09-19 18:14:39 UTC;

>

> > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC;

> > 13min

ago Active: active (running) since Wed 2018-09-19 18:14:41

> > UTC; 13min ago salt:~ # systemctl status salt-api salt-master

> > salt-minion |grep ERROR >

> > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function

> >

> > cephimages.list in mine_functions failed to execute

>

>

>

> > salt:~ # salt-key --list-all

> >

> > Accepted Keys:

> > igw1

> > mon1

> > mon2

> > mon3

> > osd1

> > osd2

> > osd3

> > salt

> > Denied Keys:

> > Unaccepted Keys:

> > Rejected Keys:

> >

> >

> > salt:~ # salt '*' test.ping

> > salt:

> >

> > True

> >

> > osd2:

> >

> > True

> >

> > mon3:

> >

> > True

> >

> > osd3:

> >

> > True

> >

> > osd1:

> >

> > True

> >

> > mon2:

> >

> > True

> >

> > igw1:

> >

> > True

> >

> > mon1:

> >

> > True

> >

> >

> > salt:~ # cat /srv/pillar/ceph/master_minion.sls

> > master_minion: salt

> >

> > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls

> > ...

> > # Choose all minions

> > deepsea_minions: '*'

> > ...

> >

> > **SALT STAGES**

> > Stage 0 is successful with no errors but does not create the

> > proposals

> > folder.

>

>

>

> > salt:~ # salt-run state.orch ceph.stage.prep

> >

> > deepsea_minions : valid

> > master_minion : valid

> > ceph_version : valid

> > [WARNING ] All minions are ready

> > salt_master:

> >

> > Name: sync master - Function: salt.state - Result: Changed

> > Started: -

> >

> > 18:44:20.440255 Duration: 949.98 ms

>

> Name: salt-api - Function: salt.state

>

> > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms

> > Name:

> > repo master - Function: salt.state - Result: Clean Started: -

> > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master -

> > Function:

> > salt.state - Result: Clean Started: - 18:44:24.998333 Duration:

> > 1127.063 ms

Name: prepare master - Function: salt.state - Result:

> > Changed Started: - 18:44:26.125514 Duration: 4109.917 ms Name:

> > filequeue.remove - Function: salt.runner - Result: Changed Started: -

> > 18:44:30.235610 Duration: 2071.199 ms Name: restart master -

> > Function: salt.state - Result: Clean Started: - 18:44:32.306972

> > Duration: 1006.268 ms Name: filequeue.add - Function: salt.runner -

> > Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms Name:

> > minions.ready - Function: salt.runner - Result: Changed Started: -

> > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function:

> > salt.state - Result: Clean Started: - 18:44:36.558363 Duration:

> > 553.342 ms Name: metapackage minions - Function: salt.state - Result:

> > Clean Started: - 18:44:37.111825 Duration: 3993.733 ms Name: common

> > packages - Function: salt.state - Result: Clean Started: -

> > 18:44:41.105706 Duration: 2434.079 ms Name: sync - Function:

> > salt.state - Result: Changed Started: -

> > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function:

> > salt.state -

> > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name:

> > updates - Function: salt.state - Result: Changed Started: -

> > 18:44:46.578853

Duration: 11183.347 ms Name: restart - Function:

> > salt.state - Result: Clean Started: - 18:44:57.762346 Duration:

> > 1553.957 ms Name: mds restart noop - Function: test.nop - Result:

> > Clean Started: - 18:44:59.316442 Duration: 0.348 ms

> >

> > Summary for salt_master

> > -------------

> > Succeeded: 17 (changed=8)

> > Failed: 0

> > -------------

> > Total states run: 17

> > Total run time: 38.874 s

> >

> >

> >

> > Before running Stage 1, the /srv/pillar/ceph/proposals directory does

> > not

> > exist.

>

> salt:~ # ls /srv/pillar/ceph/proposals/

>

> > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or

> >

> > directory

>

>

>

> > That?s where I?m at ? Googling..

> >

> > ~ Kevin

> >

> > From: <deepsea-users-bounces at lists.suse.com> on behalf of Joel Zhou

> > <joel.zhou at suse.com>

>

> Reply-To: Discussions about the DeepSea management

>

> > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday,

> > September

> > 18, 2018 at 11:34 PM

> > To: Discussions about the DeepSea management framework for Ceph

> > <deepsea-users at lists.suse.com>

>

> Subject: Re: [Deepsea-users] stage 1 errors

>

> > on Azure

> >

> > Hi Kevin,

> >

> > My short answer is,

> >

> > Step 1, before stage 0, check your salt-api service on salt-master

> > node

> > first.

>

> ```bash

>

> > zypper install -y salt-api

> > systemctl enable salt-api.service

> > systemctl start salt-api.service

> > ```

> > Step 2, make sure NTP service works correctly on all nodes, which

> > means time

synchronized correctly on all nodes.

>

> Step 3, reboot all your nodes, if

>

> > acceptable. In case of kernel updated somehow. Step 4, then you have

> > to

> > start over again from stage 0 to 5.

> >

> > Basically, deepsea is a bunch of salt scripts, and salt based on

> > python2

> > and/or python3.

>

> I have no clues about your whole running stack, so assume

>

> > SLES 12 sp3 + SES 5, which works fine and supported. More info would

> > be

> > helpful, and also your purpose, such as for practice on your own, or

> > for

> > PoC/testing to meet customer?s demands.

> > Regards,

> >

> > --

> > Joel Zhou ???

> > Senior Storage Technologist, APJ

> >

> > Mobile: +86 18514577601

> > Email: joel.zhou at suse.com

> >

> > From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres

> > <kevin.ayres at suse.com>

>

> Reply-To: Discussions about the DeepSea management

>

> > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday,

> > September

> > 18, 2018 at 4:49 PM

> > To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>

> > Subject: [Deepsea-users] stage 1 errors on Azure

> >

> > Hey guys, I can?t seem to get past stage 1. Stage 0 complete

> > successfully.

> > Same output with deepsea command. The master and minion service are

> > running

and bidirectional host resolution are good. Keys are all

> > accepted. From what I can determine, the default files are not

> > created by stage 0 for some reason. Thoughts? What I?m seeing is that

> > it fails to create the

> > /srv/pillar/ceph/proposals

>

>

>

> > I?m running through this doc line by line:

> > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtm

> > l/boo

k_storage_deployment/book_storage_deployment.html#deepsea.cli

>

>

>

> > ~ Kevin

> >

> >

> > salt:~ # salt-run state.orch ceph.stage.discovery

> >

> > salt-api : ["Salt API is failing to authenticate -

> > try

> > 'systemctl restart salt-master': list index out of range"]

>

>

>

> > deepsea_minions : valid

> >

> > master_minion : valid

> >

> > ceph_version : valid

> >

> > [ERROR ] No highstate or sls specified, no execution made

> >

> > salt_master:

> >

> > ----------

> >

> >

> > ID: salt-api failed

> >

> >

> >

> > Function: salt.state

> >

> >

> >

> > Name: just.exit

> >

> >

> >

> > Result: False

> >

> >

> >

> > Comment: No highstate or sls specified, no execution made

> >

> >

> >

> > Started: 22:30:53.628882

> >

> >

> >

> > Duration: 0.647 ms

> >

> >

> >

> > Changes:

> >

> >

> >

> >

> > Summary for salt_master

> >

> > ------------

> >

> > Succeeded: 0

> >

> > Failed: 1

> >

> > ------------

> >

> > Total states run: 1

> >

> > Total run time: 0.647 ms

> >

> > salt:~ # !tail

> > tail -f /var/log/salt/master

> > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING

> > ][8499]

> > role-igw/cluster/igw*.sls matched no files

>

> 2018-09-18 22:29:08,797

>

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > role-openattic/cluster/salt.sls matched no files 2018-09-18

> > 22:29:08,797

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > config/stack/default/global.yml matched no files 2018-09-18

> > 22:29:08,798

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > config/stack/default/ceph/cluster.yml matched no files 2018-09-18

> > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499]

> > cluster/*.sls matched no files 2018-09-18 22:29:08,798

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > stack/default/ceph/minions/*.yml matched no files 2018-09-18

> > 22:29:08,822

> > [salt.state ][ERROR ][8499] No highstate or sls specified, no

> > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR

> > ][5672]

Exception occurred while handling stream: [Errno 0] Success

> > 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No

> > highstate or sls specified, no execution made 2018-09-18 22:30:53,629

> > [salt.state ][ERROR ][9272] No highstate or sls specified, no

> > execution made

> > There?s also some issue with the salt-minion.service:

> > ? salt-minion.service - The Salt Minion

> >

> > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;

> > enabled;

> >

> > vendor preset: disabled)

>

> Active: active (running) since Tue 2018-09-18

>

> > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion)

> > ?

> > .....

> > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.

> > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function

> > cephimages.list

in mine_functions failed to execute

>

>

>

> >

>

>

>

>

> _______________________________________________

> Deepsea-users mailing list

> Deepsea-users at lists.suse.com

> http://lists.suse.com/mailman/listinfo/deepsea-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/44dfc88c/attachment.htm>

From kevin.ayres at suse.com  Fri Sep 21 12:18:06 2018
From: kevin.ayres at suse.com (Kevin Ayres)
Date: Fri, 21 Sep 2018 18:18:06 +0000
Subject: [Deepsea-users] Stage 2 error on Azure
Message-ID: <A7BD9A55-A8A2-48E2-B962-0BB168191179@suse.com>

I redeployed this SES 5 cluster and ripped off all of the Azure naming junk and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and hosts and .ssh/ files appear to be correct throughout. Stages 0, 1 complete. Stage 2 fails.

I?m troubleshooting this stage 2 error. Something to do with keyring caching and possibly the manager role running on admin node? I?ve restarted services, node, etc.  It seems to be minion issue on the salt master (adm) node. Any guidance is appreciated. Googling madly..


    salt:

        Data failed to compile:

    ----------

        Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'

salt:~ # ls -a /srv/salt/ceph/mgr/cache/
.  ..
No such folder..

salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg
...
role-mgr/cluster/mon*.sls

salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls
roles:
- mgr

Thank you!
~ Kevin

Complete sage 2 output:

        salt:~ # salt-run state.orch ceph.stage.configure
        deepsea_minions          : valid
        yaml_syntax              : valid
        profiles_populated       : valid
        public network           : 172.19.20.0/24
        cluster network          : 172.19.20.0/24
        [ERROR   ] Run failed on minions: salt
        Failures:
            salt:
                Data failed to compile:
            ----------
                Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'

        salt_master:
          Name: push.proposal - Function: salt.runner - Result: Changed Started: - 17:04:46.183848 Duration: 1392.712 ms
          Name: refresh_pillar1 - Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration: 784.477 ms
          Name: advise.networks - Function: salt.runner - Result: Clean Started: - 17:04:48.361300 Duration: 1766.973 ms
          Name: admin key - Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration: 404.333 ms
          Name: mon key - Function: salt.state - Result: Changed Started: - 17:04:50.532926 Duration: 563.858 ms
        ----------
                  ID: mgr key
            Function: salt.state
              Result: False
             Comment: Run failed on minions: salt
                      Failures:
                          salt:
                              Data failed to compile:
                          ----------
                              Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
             Started: 17:04:51.096948
            Duration: 2690.64 ms
             Changes:

        Summary for salt_master
        ------------
        Succeeded: 5 (changed=3)
        Failed:    1
        ------------
        Total states run:     6
        Total run time:   7.603 s

        salt:~ # service salt-minion status
        * salt-minion.service - The Salt Minion
           Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
           Active: active (running) since Thu 2018-09-20 23:53:42 UTC; 17h ago
         Main PID: 13456 (salt-minion)
            Tasks: 6 (limit: 512)
           CGroup: /system.slice/salt-minion.service
                   |-13456 /usr/bin/python /usr/bin/salt-minion
                   |-13462 /usr/bin/python /usr/bin/salt-minion
                   `-13465 /usr/bin/python /usr/bin/salt-minion

        Sep 21 13:03:37 salt salt-minion[13456]: [ERROR   ] Exception occurred while handling stream: [Errno 0] Success
        Sep 21 13:05:57 salt salt-minion[13456]: [ERROR   ] Exception occurred while handling stream: [Errno 0] Success
        Sep 21 13:50:19 salt salt-minion[13456]: [ERROR   ] Exception occurred while handling stream: [Errno 0] Success
        Sep 21 13:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in mine_functions failed to execute
        Sep 21 14:47:32 salt salt-minion[13456]: [ERROR   ] Exception occurred while handling stream: [Errno 0] Success
        Sep 21 14:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in mine_functions failed to execute
        Sep 21 15:05:03 salt salt-minion[13456]: [ERROR   ] Exception occurred while handling stream: [Errno 0] Success
        Sep 21 15:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in mine_functions failed to execute
        Sep 21 16:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in mine_functions failed to execute
        Sep 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180921/fd1131ed/attachment.htm>

From ejackson at suse.com  Fri Sep 21 12:55:26 2018
From: ejackson at suse.com (Eric Jackson)
Date: Fri, 21 Sep 2018 14:55:26 -0400
Subject: [Deepsea-users] Stage 2 error on Azure
In-Reply-To: <A7BD9A55-A8A2-48E2-B962-0BB168191179@suse.com>
References: <A7BD9A55-A8A2-48E2-B962-0BB168191179@suse.com>
Message-ID: <2179785.o9lRQUWKH7@fury.home>

Hi Kevin,
  Check your minion names.  Try 'salt-key -L'.  The reason for the "Conflicting 
ID" is that Salt will unroll a Jinja loop.  For example, if you have three 
minions assigned the mgr role, then the preprocessing of /srv/salt/ceph/mgr/
key/default.sls will create three separate stanzas.  Yaml requires unique 
identifiers.  
  The key file names you expected in /srv/salt/ceph/mgr/cache would be 
minion1.keyring, minion2.keyring and minon3.keyring.  However, you are getting 
localhost.keyring.  So, you have at least two and likely three minions all 
replying to the Salt master that they are "localhost".

  Check on each of your salt minions the value in /etc/salt/minion_id.  If 
that is incorrect (and says "localhost"), delete the minion from the Salt 
master, correct the minion_id file, restart the salt minion and then accept the 
key on the Salt master.  The commands would be

admin# salt-key -d ID
minion1# vi /etc/salt/minion_id
minion1# systemctl restart salt-minion
admin# salt-key -A

  Once that is resolved, you can run just the ceph.mgr.key step to verify.

admin# salt 'admin*' state.apply ceph.mgr.key

  When that works, try Stage 2 again.

Eric

On Friday, September 21, 2018 2:18:06 PM EDT Kevin Ayres wrote:
> I redeployed this SES 5 cluster and ripped off all of the Azure naming junk
> and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and
> hosts and .ssh/ files appear to be correct throughout. Stages 0, 1
> complete. Stage 2 fails.
 
> I?m troubleshooting this stage 2 error. Something to do with keyring caching
> and possibly the manager role running on admin node? I?ve restarted
> services, node, etc.  It seems to be minion issue on the salt master (adm)
> node. Any guidance is appreciated. Googling madly..
 
> 
>     salt:
> 
>         Data failed to compile:
> 
>     ----------
> 
>         Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID
> '/srv/salt/ceph/mgr/cache/localhost.keyring'
 
> salt:~ # ls -a /srv/salt/ceph/mgr/cache/
> .  ..
> No such folder..
> 
> salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg
> ...
> role-mgr/cluster/mon*.sls
> 
> salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls
> roles:
> - mgr
> 
> Thank you!
> ~ Kevin
> 
> Complete sage 2 output:
> 
>         salt:~ # salt-run state.orch ceph.stage.configure
>         deepsea_minions          : valid
>         yaml_syntax              : valid
>         profiles_populated       : valid
>         public network           : 172.19.20.0/24
>         cluster network          : 172.19.20.0/24
>         [ERROR   ] Run failed on minions: salt
>         Failures:
>             salt:
>                 Data failed to compile:
>             ----------
>                 Rendering SLS 'base:ceph.mgr.key.default' failed:
> Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
 
>         salt_master:
>           Name: push.proposal - Function: salt.runner - Result: Changed
> Started: - 17:04:46.183848 Duration: 1392.712 ms
 Name: refresh_pillar1 -
> Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration:
> 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean
> Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key -
> Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration:
> 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started:
> - 17:04:50.532926 Duration: 563.858 ms ----------
>                   ID: mgr key
>             Function: salt.state
>               Result: False
>              Comment: Run failed on minions: salt
>                       Failures:
>                           salt:
>                               Data failed to compile:
>                           ----------
>                               Rendering SLS 'base:ceph.mgr.key.default'
> failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
> Started: 17:04:51.096948
>             Duration: 2690.64 ms
>              Changes:
> 
>         Summary for salt_master
>         ------------
>         Succeeded: 5 (changed=3)
>         Failed:    1
>         ------------
>         Total states run:     6
>         Total run time:   7.603 s
> 
>         salt:~ # service salt-minion status
>         * salt-minion.service - The Salt Minion
>            Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;
> enabled; vendor preset: disabled)
 Active: active (running) since Thu
> 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion)
>             Tasks: 6 (limit: 512)
>            CGroup: /system.slice/salt-minion.service
> 
>                    |-13456 /usr/bin/python /usr/bin/salt-minion
>                    |-13462 /usr/bin/python /usr/bin/salt-minion
> 
>                    `-13465 /usr/bin/python /usr/bin/salt-minion
> 
>         Sep 21 13:03:37 salt salt-minion[13456]: [ERROR   ] Exception
> occurred while handling stream: [Errno 0] Success
 Sep 21 13:05:57 salt
> salt-minion[13456]: [ERROR   ] Exception occurred while handling stream:
> [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR   ]
> Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44
> salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
> mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]:
> [ERROR   ] Exception occurred while handling stream: [Errno 0] Success Sep
> 21 14:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
> mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]:
> [ERROR   ] Exception occurred while handling stream: [Errno 0] Success Sep
> 21 15:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
> mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]:
> [ERROR   ] Function cephimages.list in mine_functions failed to execute Sep
> 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS
> 'base:ceph.mgr.key.default' failed: Conflicting ID
> '/srv/salt/ceph/mgr/cache/localhost.keyring'

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180921/a75b721c/attachment.sig>

From kevin.ayres at suse.com  Sat Sep 22 23:26:35 2018
From: kevin.ayres at suse.com (Kevin Ayres)
Date: Sun, 23 Sep 2018 05:26:35 +0000
Subject: [Deepsea-users] Stage 2 error on Azure
In-Reply-To: <2179785.o9lRQUWKH7@fury.home>
References: <A7BD9A55-A8A2-48E2-B962-0BB168191179@suse.com>
	<2179785.o9lRQUWKH7@fury.home>
Message-ID: <2BD5F895-B84E-4A19-84A3-A853E1B692A6@suse.com>

Thanks Eric. No joy, I'm missing something here.. 

salt:~ # salt-key -L
Accepted Keys:
igw1
mon1
mon2
mon3
osd1
osd2
osd3
salt

salt:~ # ls -a /srv/salt/ceph/mgr/cache
.  ..

salt:~ # salt-key -d salt
The following keys are going to be deleted:
Accepted Keys:
salt

salt:~ # vi /etc/salt/minion_id

salt:~ # salt-key -A
The following keys are going to be accepted:
Unaccepted Keys:
salt
Proceed? [n/Y] y
Key for minion salt accepted.

salt:~ # salt 'admin*' state.apply ceph.mgr.key
No minions matched the target. No command was sent, no jid was assigned.
ERROR: No return received

salt:~ # salt 'salt*' state.apply ceph.mgr.key
salt:
    Data failed to compile:
----------
    Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
ERROR: Minions returned with non-zero exit code

salt:~ # salt-run state.orch ceph.stage.configure
deepsea_minions          : valid
yaml_syntax              : valid
profiles_populated       : valid
public network           : 172.19.20.0/24
cluster network          : 172.19.20.0/24
[ERROR   ] Run failed on minions: salt
Failures:
    salt:
        Data failed to compile:
    ----------
        Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'

salt_master:
  Name: push.proposal - Function: salt.runner - Result: Changed Started: - 05:22:09.015323 Duration: 1330.864 ms
  Name: refresh_pillar1 - Function: salt.state - Result: Changed Started: - 05:22:10.346301 Duration: 759.065 ms
  Name: advise.networks - Function: salt.runner - Result: Clean Started: - 05:22:11.105561 Duration: 1699.562 ms
  Name: admin key - Function: salt.state - Result: Clean Started: - 05:22:12.805243 Duration: 391.792 ms
  Name: mon key - Function: salt.state - Result: Changed Started: - 05:22:13.197149 Duration: 381.892 ms
----------
          ID: mgr key
    Function: salt.state
      Result: False
     Comment: Run failed on minions: salt
              Failures:
                  salt:
                      Data failed to compile:
                  ----------
                      Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
     Started: 05:22:13.579153
    Duration: 2510.463 ms
     Changes:   

Summary for salt_master
------------
Succeeded: 5 (changed=3)
Failed:    1
------------
Total states run:     6
Total run time:   7.074 s

~ Kevin

?On 9/21/18, 11:56 AM, "deepsea-users-bounces at lists.suse.com on behalf of Eric Jackson" <deepsea-users-bounces at lists.suse.com on behalf of ejackson at suse.com> wrote:

    Hi Kevin,
      Check your minion names.  Try 'salt-key -L'.  The reason for the "Conflicting 
    ID" is that Salt will unroll a Jinja loop.  For example, if you have three 
    minions assigned the mgr role, then the preprocessing of /srv/salt/ceph/mgr/
    key/default.sls will create three separate stanzas.  Yaml requires unique 
    identifiers.  
      The key file names you expected in /srv/salt/ceph/mgr/cache would be 
    minion1.keyring, minion2.keyring and minon3.keyring.  However, you are getting 
    localhost.keyring.  So, you have at least two and likely three minions all 
    replying to the Salt master that they are "localhost".
    
      Check on each of your salt minions the value in /etc/salt/minion_id.  If 
    that is incorrect (and says "localhost"), delete the minion from the Salt 
    master, correct the minion_id file, restart the salt minion and then accept the 
    key on the Salt master.  The commands would be
    
    admin# salt-key -d ID
    minion1# vi /etc/salt/minion_id
    minion1#  
    admin# salt-key -A
    
      Once that is resolved, you can run just the ceph.mgr.key step to verify.
    
    admin# salt 'admin*' state.apply ceph.mgr.key
    
      When that works, try Stage 2 again.
    
    Eric
    
    On Friday, September 21, 2018 2:18:06 PM EDT Kevin Ayres wrote:
    > I redeployed this SES 5 cluster and ripped off all of the Azure naming junk
    > and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and
    > hosts and .ssh/ files appear to be correct throughout. Stages 0, 1
    > complete. Stage 2 fails.
     
    > I?m troubleshooting this stage 2 error. Something to do with keyring caching
    > and possibly the manager role running on admin node? I?ve restarted
    > services, node, etc.  It seems to be minion issue on the salt master (adm)
    > node. Any guidance is appreciated. Googling madly..
     
    > 
    >     salt:
    > 
    >         Data failed to compile:
    > 
    >     ----------
    > 
    >         Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID
    > '/srv/salt/ceph/mgr/cache/localhost.keyring'
     
    > salt:~ # ls -a /srv/salt/ceph/mgr/cache/
    > .  ..
    > No such folder..
    > 
    > salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg
    > ...
    > role-mgr/cluster/mon*.sls
    > 
    > salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls
    > roles:
    > - mgr
    > 
    > Thank you!
    > ~ Kevin
    > 
    > Complete sage 2 output:
    > 
    >         salt:~ # salt-run state.orch ceph.stage.configure
    >         deepsea_minions          : valid
    >         yaml_syntax              : valid
    >         profiles_populated       : valid
    >         public network           : 172.19.20.0/24
    >         cluster network          : 172.19.20.0/24
    >         [ERROR   ] Run failed on minions: salt
    >         Failures:
    >             salt:
    >                 Data failed to compile:
    >             ----------
    >                 Rendering SLS 'base:ceph.mgr.key.default' failed:
    > Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
     
    >         salt_master:
    >           Name: push.proposal - Function: salt.runner - Result: Changed
    > Started: - 17:04:46.183848 Duration: 1392.712 ms
     Name: refresh_pillar1 -
    > Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration:
    > 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean
    > Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key -
    > Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration:
    > 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started:
    > - 17:04:50.532926 Duration: 563.858 ms ----------
    >                   ID: mgr key
    >             Function: salt.state
    >               Result: False
    >              Comment: Run failed on minions: salt
    >                       Failures:
    >                           salt:
    >                               Data failed to compile:
    >                           ----------
    >                               Rendering SLS 'base:ceph.mgr.key.default'
    > failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
    > Started: 17:04:51.096948
    >             Duration: 2690.64 ms
    >              Changes:
    > 
    >         Summary for salt_master
    >         ------------
    >         Succeeded: 5 (changed=3)
    >         Failed:    1
    >         ------------
    >         Total states run:     6
    >         Total run time:   7.603 s
    > 
    >         salt:~ # service salt-minion status
    >         * salt-minion.service - The Salt Minion
    >            Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;
    > enabled; vendor preset: disabled)
     Active: active (running) since Thu
    > 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion)
    >             Tasks: 6 (limit: 512)
    >            CGroup: /system.slice/salt-minion.service
    > 
    >                    |-13456 /usr/bin/python /usr/bin/salt-minion
    >                    |-13462 /usr/bin/python /usr/bin/salt-minion
    > 
    >                    `-13465 /usr/bin/python /usr/bin/salt-minion
    > 
    >         Sep 21 13:03:37 salt salt-minion[13456]: [ERROR   ] Exception
    > occurred while handling stream: [Errno 0] Success
     Sep 21 13:05:57 salt
    > salt-minion[13456]: [ERROR   ] Exception occurred while handling stream:
    > [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR   ]
    > Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44
    > salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
    > mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]:
    > [ERROR   ] Exception occurred while handling stream: [Errno 0] Success Sep
    > 21 14:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
    > mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]:
    > [ERROR   ] Exception occurred while handling stream: [Errno 0] Success Sep
    > 21 15:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
    > mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]:
    > [ERROR   ] Function cephimages.list in mine_functions failed to execute Sep
    > 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS
    > 'base:ceph.mgr.key.default' failed: Conflicting ID
    > '/srv/salt/ceph/mgr/cache/localhost.keyring'
    
    
From dbyte at suse.com  Sat Sep 22 23:35:17 2018
From: dbyte at suse.com (David Byte)
Date: Sun, 23 Sep 2018 05:35:17 +0000
Subject: [Deepsea-users] Stage 2 error on Azure
In-Reply-To: <2BD5F895-B84E-4A19-84A3-A853E1B692A6@suse.com>
References: <A7BD9A55-A8A2-48E2-B962-0BB168191179@suse.com>
	<2179785.o9lRQUWKH7@fury.home>
	<2BD5F895-B84E-4A19-84A3-A853E1B692A6@suse.com>
Message-ID: <0CF256E4-87D5-48C8-A76E-779C3D70ABB0@suse.com>

I know you are going to love this, but I suspect you have stuff cached that needs to be cleaned up. The issues you are seeing are exactly things where I use my cleanit.sh script to wipe it all out and allow me to start over.


David Byte
Sr. Technology Strategist
SCE Enterprise Linux 
SCE Enterprise Storage
Alliances and SUSE Embedded
dbyte at suse.com
918.528.4422

?On 9/22/18, 11:26 PM, "deepsea-users-bounces at lists.suse.com on behalf of Kevin Ayres" <deepsea-users-bounces at lists.suse.com on behalf of kevin.ayres at suse.com> wrote:

    Thanks Eric. No joy, I'm missing something here.. 
    
    salt:~ # salt-key -L
    Accepted Keys:
    igw1
    mon1
    mon2
    mon3
    osd1
    osd2
    osd3
    salt
    
    salt:~ # ls -a /srv/salt/ceph/mgr/cache
    .  ..
    
    salt:~ # salt-key -d salt
    The following keys are going to be deleted:
    Accepted Keys:
    salt
    
    salt:~ # vi /etc/salt/minion_id
    
    salt:~ # salt-key -A
    The following keys are going to be accepted:
    Unaccepted Keys:
    salt
    Proceed? [n/Y] y
    Key for minion salt accepted.
    
    salt:~ # salt 'admin*' state.apply ceph.mgr.key
    No minions matched the target. No command was sent, no jid was assigned.
    ERROR: No return received
    
    salt:~ # salt 'salt*' state.apply ceph.mgr.key
    salt:
        Data failed to compile:
    ----------
        Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
    ERROR: Minions returned with non-zero exit code
    
    salt:~ # salt-run state.orch ceph.stage.configure
    deepsea_minions          : valid
    yaml_syntax              : valid
    profiles_populated       : valid
    public network           : 172.19.20.0/24
    cluster network          : 172.19.20.0/24
    [ERROR   ] Run failed on minions: salt
    Failures:
        salt:
            Data failed to compile:
        ----------
            Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
    
    salt_master:
      Name: push.proposal - Function: salt.runner - Result: Changed Started: - 05:22:09.015323 Duration: 1330.864 ms
      Name: refresh_pillar1 - Function: salt.state - Result: Changed Started: - 05:22:10.346301 Duration: 759.065 ms
      Name: advise.networks - Function: salt.runner - Result: Clean Started: - 05:22:11.105561 Duration: 1699.562 ms
      Name: admin key - Function: salt.state - Result: Clean Started: - 05:22:12.805243 Duration: 391.792 ms
      Name: mon key - Function: salt.state - Result: Changed Started: - 05:22:13.197149 Duration: 381.892 ms
    ----------
              ID: mgr key
        Function: salt.state
          Result: False
         Comment: Run failed on minions: salt
                  Failures:
                      salt:
                          Data failed to compile:
                      ----------
                          Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
         Started: 05:22:13.579153
        Duration: 2510.463 ms
         Changes:   
    
    Summary for salt_master
    ------------
    Succeeded: 5 (changed=3)
    Failed:    1
    ------------
    Total states run:     6
    Total run time:   7.074 s
    
    ~ Kevin
    
    On 9/21/18, 11:56 AM, "deepsea-users-bounces at lists.suse.com on behalf of Eric Jackson" <deepsea-users-bounces at lists.suse.com on behalf of ejackson at suse.com> wrote:
    
        Hi Kevin,
          Check your minion names.  Try 'salt-key -L'.  The reason for the "Conflicting 
        ID" is that Salt will unroll a Jinja loop.  For example, if you have three 
        minions assigned the mgr role, then the preprocessing of /srv/salt/ceph/mgr/
        key/default.sls will create three separate stanzas.  Yaml requires unique 
        identifiers.  
          The key file names you expected in /srv/salt/ceph/mgr/cache would be 
        minion1.keyring, minion2.keyring and minon3.keyring.  However, you are getting 
        localhost.keyring.  So, you have at least two and likely three minions all 
        replying to the Salt master that they are "localhost".
        
          Check on each of your salt minions the value in /etc/salt/minion_id.  If 
        that is incorrect (and says "localhost"), delete the minion from the Salt 
        master, correct the minion_id file, restart the salt minion and then accept the 
        key on the Salt master.  The commands would be
        
        admin# salt-key -d ID
        minion1# vi /etc/salt/minion_id
        minion1#  
        admin# salt-key -A
        
          Once that is resolved, you can run just the ceph.mgr.key step to verify.
        
        admin# salt 'admin*' state.apply ceph.mgr.key
        
          When that works, try Stage 2 again.
        
        Eric
        
        On Friday, September 21, 2018 2:18:06 PM EDT Kevin Ayres wrote:
        > I redeployed this SES 5 cluster and ripped off all of the Azure naming junk
        > and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and
        > hosts and .ssh/ files appear to be correct throughout. Stages 0, 1
        > complete. Stage 2 fails.
         
        > I?m troubleshooting this stage 2 error. Something to do with keyring caching
        > and possibly the manager role running on admin node? I?ve restarted
        > services, node, etc.  It seems to be minion issue on the salt master (adm)
        > node. Any guidance is appreciated. Googling madly..
         
        > 
        >     salt:
        > 
        >         Data failed to compile:
        > 
        >     ----------
        > 
        >         Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID
        > '/srv/salt/ceph/mgr/cache/localhost.keyring'
         
        > salt:~ # ls -a /srv/salt/ceph/mgr/cache/
        > .  ..
        > No such folder..
        > 
        > salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg
        > ...
        > role-mgr/cluster/mon*.sls
        > 
        > salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls
        > roles:
        > - mgr
        > 
        > Thank you!
        > ~ Kevin
        > 
        > Complete sage 2 output:
        > 
        >         salt:~ # salt-run state.orch ceph.stage.configure
        >         deepsea_minions          : valid
        >         yaml_syntax              : valid
        >         profiles_populated       : valid
        >         public network           : 172.19.20.0/24
        >         cluster network          : 172.19.20.0/24
        >         [ERROR   ] Run failed on minions: salt
        >         Failures:
        >             salt:
        >                 Data failed to compile:
        >             ----------
        >                 Rendering SLS 'base:ceph.mgr.key.default' failed:
        > Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
         
        >         salt_master:
        >           Name: push.proposal - Function: salt.runner - Result: Changed
        > Started: - 17:04:46.183848 Duration: 1392.712 ms
         Name: refresh_pillar1 -
        > Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration:
        > 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean
        > Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key -
        > Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration:
        > 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started:
        > - 17:04:50.532926 Duration: 563.858 ms ----------
        >                   ID: mgr key
        >             Function: salt.state
        >               Result: False
        >              Comment: Run failed on minions: salt
        >                       Failures:
        >                           salt:
        >                               Data failed to compile:
        >                           ----------
        >                               Rendering SLS 'base:ceph.mgr.key.default'
        > failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
        > Started: 17:04:51.096948
        >             Duration: 2690.64 ms
        >              Changes:
        > 
        >         Summary for salt_master
        >         ------------
        >         Succeeded: 5 (changed=3)
        >         Failed:    1
        >         ------------
        >         Total states run:     6
        >         Total run time:   7.603 s
        > 
        >         salt:~ # service salt-minion status
        >         * salt-minion.service - The Salt Minion
        >            Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;
        > enabled; vendor preset: disabled)
         Active: active (running) since Thu
        > 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion)
        >             Tasks: 6 (limit: 512)
        >            CGroup: /system.slice/salt-minion.service
        > 
        >                    |-13456 /usr/bin/python /usr/bin/salt-minion
        >                    |-13462 /usr/bin/python /usr/bin/salt-minion
        > 
        >                    `-13465 /usr/bin/python /usr/bin/salt-minion
        > 
        >         Sep 21 13:03:37 salt salt-minion[13456]: [ERROR   ] Exception
        > occurred while handling stream: [Errno 0] Success
         Sep 21 13:05:57 salt
        > salt-minion[13456]: [ERROR   ] Exception occurred while handling stream:
        > [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR   ]
        > Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44
        > salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
        > mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]:
        > [ERROR   ] Exception occurred while handling stream: [Errno 0] Success Sep
        > 21 14:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
        > mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]:
        > [ERROR   ] Exception occurred while handling stream: [Errno 0] Success Sep
        > 21 15:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
        > mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]:
        > [ERROR   ] Function cephimages.list in mine_functions failed to execute Sep
        > 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS
        > 'base:ceph.mgr.key.default' failed: Conflicting ID
        > '/srv/salt/ceph/mgr/cache/localhost.keyring'
        
        
    _______________________________________________
    Deepsea-users mailing list
    Deepsea-users at lists.suse.com
    http://lists.suse.com/mailman/listinfo/deepsea-users
    

From tserong at suse.com  Sun Sep 23 23:23:58 2018
From: tserong at suse.com (Tim Serong)
Date: Mon, 24 Sep 2018 15:23:58 +1000
Subject: [Deepsea-users] Stage 2 error on Azure
In-Reply-To: <0CF256E4-87D5-48C8-A76E-779C3D70ABB0@suse.com>
References: <A7BD9A55-A8A2-48E2-B962-0BB168191179@suse.com>
	<2179785.o9lRQUWKH7@fury.home>
	<2BD5F895-B84E-4A19-84A3-A853E1B692A6@suse.com>
	<0CF256E4-87D5-48C8-A76E-779C3D70ABB0@suse.com>
Message-ID: <4b5517a2-1224-8c38-9b09-4e7805062870@suse.com>

Have a look in /etc/hosts, and make sure the system hostname does *not*
appear on either the IPv4 or IPv6 localhost lines.  That was the problem
last time I saw this issue (presumably something was resolving the
hostname, getting 127.0.0.1 or ::1 back, then doing a reverse lookup on
that address, and finishing up with localhost).

Regards,

Tim

On 09/23/2018 03:35 PM, David Byte wrote:
> I know you are going to love this, but I suspect you have stuff cached that needs to be cleaned up. The issues you are seeing are exactly things where I use my cleanit.sh script to wipe it all out and allow me to start over.
> 
> 
> David Byte
> Sr. Technology Strategist
> SCE Enterprise Linux 
> SCE Enterprise Storage
> Alliances and SUSE Embedded
> dbyte at suse.com
> 918.528.4422
> 
> ?On 9/22/18, 11:26 PM, "deepsea-users-bounces at lists.suse.com on behalf of Kevin Ayres" <deepsea-users-bounces at lists.suse.com on behalf of kevin.ayres at suse.com> wrote:
> 
>     Thanks Eric. No joy, I'm missing something here.. 
>     
>     salt:~ # salt-key -L
>     Accepted Keys:
>     igw1
>     mon1
>     mon2
>     mon3
>     osd1
>     osd2
>     osd3
>     salt
>     
>     salt:~ # ls -a /srv/salt/ceph/mgr/cache
>     .  ..
>     
>     salt:~ # salt-key -d salt
>     The following keys are going to be deleted:
>     Accepted Keys:
>     salt
>     
>     salt:~ # vi /etc/salt/minion_id
>     
>     salt:~ # salt-key -A
>     The following keys are going to be accepted:
>     Unaccepted Keys:
>     salt
>     Proceed? [n/Y] y
>     Key for minion salt accepted.
>     
>     salt:~ # salt 'admin*' state.apply ceph.mgr.key
>     No minions matched the target. No command was sent, no jid was assigned.
>     ERROR: No return received
>     
>     salt:~ # salt 'salt*' state.apply ceph.mgr.key
>     salt:
>         Data failed to compile:
>     ----------
>         Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
>     ERROR: Minions returned with non-zero exit code
>     
>     salt:~ # salt-run state.orch ceph.stage.configure
>     deepsea_minions          : valid
>     yaml_syntax              : valid
>     profiles_populated       : valid
>     public network           : 172.19.20.0/24
>     cluster network          : 172.19.20.0/24
>     [ERROR   ] Run failed on minions: salt
>     Failures:
>         salt:
>             Data failed to compile:
>         ----------
>             Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
>     
>     salt_master:
>       Name: push.proposal - Function: salt.runner - Result: Changed Started: - 05:22:09.015323 Duration: 1330.864 ms
>       Name: refresh_pillar1 - Function: salt.state - Result: Changed Started: - 05:22:10.346301 Duration: 759.065 ms
>       Name: advise.networks - Function: salt.runner - Result: Clean Started: - 05:22:11.105561 Duration: 1699.562 ms
>       Name: admin key - Function: salt.state - Result: Clean Started: - 05:22:12.805243 Duration: 391.792 ms
>       Name: mon key - Function: salt.state - Result: Changed Started: - 05:22:13.197149 Duration: 381.892 ms
>     ----------
>               ID: mgr key
>         Function: salt.state
>           Result: False
>          Comment: Run failed on minions: salt
>                   Failures:
>                       salt:
>                           Data failed to compile:
>                       ----------
>                           Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
>          Started: 05:22:13.579153
>         Duration: 2510.463 ms
>          Changes:   
>     
>     Summary for salt_master
>     ------------
>     Succeeded: 5 (changed=3)
>     Failed:    1
>     ------------
>     Total states run:     6
>     Total run time:   7.074 s
>     
>     ~ Kevin
>     
>     On 9/21/18, 11:56 AM, "deepsea-users-bounces at lists.suse.com on behalf of Eric Jackson" <deepsea-users-bounces at lists.suse.com on behalf of ejackson at suse.com> wrote:
>     
>         Hi Kevin,
>           Check your minion names.  Try 'salt-key -L'.  The reason for the "Conflicting 
>         ID" is that Salt will unroll a Jinja loop.  For example, if you have three 
>         minions assigned the mgr role, then the preprocessing of /srv/salt/ceph/mgr/
>         key/default.sls will create three separate stanzas.  Yaml requires unique 
>         identifiers.  
>           The key file names you expected in /srv/salt/ceph/mgr/cache would be 
>         minion1.keyring, minion2.keyring and minon3.keyring.  However, you are getting 
>         localhost.keyring.  So, you have at least two and likely three minions all 
>         replying to the Salt master that they are "localhost".
>         
>           Check on each of your salt minions the value in /etc/salt/minion_id.  If 
>         that is incorrect (and says "localhost"), delete the minion from the Salt 
>         master, correct the minion_id file, restart the salt minion and then accept the 
>         key on the Salt master.  The commands would be
>         
>         admin# salt-key -d ID
>         minion1# vi /etc/salt/minion_id
>         minion1#  
>         admin# salt-key -A
>         
>           Once that is resolved, you can run just the ceph.mgr.key step to verify.
>         
>         admin# salt 'admin*' state.apply ceph.mgr.key
>         
>           When that works, try Stage 2 again.
>         
>         Eric
>         
>         On Friday, September 21, 2018 2:18:06 PM EDT Kevin Ayres wrote:
>         > I redeployed this SES 5 cluster and ripped off all of the Azure naming junk
>         > and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and
>         > hosts and .ssh/ files appear to be correct throughout. Stages 0, 1
>         > complete. Stage 2 fails.
>          
>         > I?m troubleshooting this stage 2 error. Something to do with keyring caching
>         > and possibly the manager role running on admin node? I?ve restarted
>         > services, node, etc.  It seems to be minion issue on the salt master (adm)
>         > node. Any guidance is appreciated. Googling madly..
>          
>         > 
>         >     salt:
>         > 
>         >         Data failed to compile:
>         > 
>         >     ----------
>         > 
>         >         Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID
>         > '/srv/salt/ceph/mgr/cache/localhost.keyring'
>          
>         > salt:~ # ls -a /srv/salt/ceph/mgr/cache/
>         > .  ..
>         > No such folder..
>         > 
>         > salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg
>         > ...
>         > role-mgr/cluster/mon*.sls
>         > 
>         > salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls
>         > roles:
>         > - mgr
>         > 
>         > Thank you!
>         > ~ Kevin
>         > 
>         > Complete sage 2 output:
>         > 
>         >         salt:~ # salt-run state.orch ceph.stage.configure
>         >         deepsea_minions          : valid
>         >         yaml_syntax              : valid
>         >         profiles_populated       : valid
>         >         public network           : 172.19.20.0/24
>         >         cluster network          : 172.19.20.0/24
>         >         [ERROR   ] Run failed on minions: salt
>         >         Failures:
>         >             salt:
>         >                 Data failed to compile:
>         >             ----------
>         >                 Rendering SLS 'base:ceph.mgr.key.default' failed:
>         > Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
>          
>         >         salt_master:
>         >           Name: push.proposal - Function: salt.runner - Result: Changed
>         > Started: - 17:04:46.183848 Duration: 1392.712 ms
>          Name: refresh_pillar1 -
>         > Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration:
>         > 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean
>         > Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key -
>         > Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration:
>         > 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started:
>         > - 17:04:50.532926 Duration: 563.858 ms ----------
>         >                   ID: mgr key
>         >             Function: salt.state
>         >               Result: False
>         >              Comment: Run failed on minions: salt
>         >                       Failures:
>         >                           salt:
>         >                               Data failed to compile:
>         >                           ----------
>         >                               Rendering SLS 'base:ceph.mgr.key.default'
>         > failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring'
>         > Started: 17:04:51.096948
>         >             Duration: 2690.64 ms
>         >              Changes:
>         > 
>         >         Summary for salt_master
>         >         ------------
>         >         Succeeded: 5 (changed=3)
>         >         Failed:    1
>         >         ------------
>         >         Total states run:     6
>         >         Total run time:   7.603 s
>         > 
>         >         salt:~ # service salt-minion status
>         >         * salt-minion.service - The Salt Minion
>         >            Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;
>         > enabled; vendor preset: disabled)
>          Active: active (running) since Thu
>         > 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion)
>         >             Tasks: 6 (limit: 512)
>         >            CGroup: /system.slice/salt-minion.service
>         > 
>         >                    |-13456 /usr/bin/python /usr/bin/salt-minion
>         >                    |-13462 /usr/bin/python /usr/bin/salt-minion
>         > 
>         >                    `-13465 /usr/bin/python /usr/bin/salt-minion
>         > 
>         >         Sep 21 13:03:37 salt salt-minion[13456]: [ERROR   ] Exception
>         > occurred while handling stream: [Errno 0] Success
>          Sep 21 13:05:57 salt
>         > salt-minion[13456]: [ERROR   ] Exception occurred while handling stream:
>         > [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR   ]
>         > Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44
>         > salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
>         > mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]:
>         > [ERROR   ] Exception occurred while handling stream: [Errno 0] Success Sep
>         > 21 14:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
>         > mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]:
>         > [ERROR   ] Exception occurred while handling stream: [Errno 0] Success Sep
>         > 21 15:53:44 salt salt-minion[13456]: [ERROR   ] Function cephimages.list in
>         > mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]:
>         > [ERROR   ] Function cephimages.list in mine_functions failed to execute Sep
>         > 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS
>         > 'base:ceph.mgr.key.default' failed: Conflicting ID
>         > '/srv/salt/ceph/mgr/cache/localhost.keyring'
>         
>         
>     
>     _______________________________________________
>     Deepsea-users mailing list
>     Deepsea-users at lists.suse.com
>     http://lists.suse.com/mailman/listinfo/deepsea-users
>     
> 
> _______________________________________________
> Deepsea-users mailing list
> Deepsea-users at lists.suse.com
> http://lists.suse.com/mailman/listinfo/deepsea-users
> 


-- 
Tim Serong
Senior Clustering Engineer
SUSE
tserong at suse.com