From knighthoot at gmail.com Mon Sep 10 07:19:26 2018 From: knighthoot at gmail.com (gna bla) Date: Mon, 10 Sep 2018 15:19:26 +0200 Subject: [Deepsea-users] Bug with "/srv/salt/ceph/updates/restart/default.sls" Message-ID: Hello everyone, For a small teat project, I was asked to try out OpenAttic with Ceph. Obviously I decided to use DeepSea, as the OA docs suggested. I ran into a bug in the file /srv/salt/ceph/updates/restart/default.sls. My setup was 3 server, have been following your readme.md and everything was fine and dandy until I had to execute # salt-run state.orch ceph.stage.0 It spew a couple errors and together with my boss, we searched for a solution. The problem was the file mentioned above, specifically it had problems on line 3 with the "rpm" command. The default code didn't work for us, so we tried a workaround mentioned here: https://github.com/saltstack/salt/issues/43569#issuecomment-330209788 With this workaround the third line looks like this: {% set installed = salt['cmd.run']('/bin/sh -c "rpm -q --last kernel-default |head -1 |cut -f1 -d\ "') | replace('kernel-default-', '') %} With the new code, everything worked fine. I honestly don't know what causes this or why it fails me on exactly there but the workaround helped. I didn't mention this above, but with the original code I got some error messages hinting to "rpm: -1: unknown" something like that. Seems like the program was able to find the kernel version but unable to parse it. I could be wrong on this one though as I am not a developer :-) Thank you for taking your time and reading this. Kind regards, MrPiano -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.ayres at suse.com Tue Sep 18 16:49:07 2018 From: kevin.ayres at suse.com (Kevin Ayres) Date: Tue, 18 Sep 2018 22:49:07 +0000 Subject: [Deepsea-users] stage 1 errors on Azure Message-ID: Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully. Same output with deepsea command. The master and minion service are running and bidirectional host resolution are good. Keys are all accepted. From what I can determine, the default files are not created by stage 0 for some reason. Thoughts? What I?m seeing is that it fails to create the /srv/pillar/ceph/proposals I?m running through this doc line by line: https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/book_storage_deployment/book_storage_deployment.html#deepsea.cli ~ Kevin salt:~ # salt-run state.orch ceph.stage.discovery salt-api : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"] deepsea_minions : valid master_minion : valid ceph_version : valid [ERROR ] No highstate or sls specified, no execution made salt_master: ---------- ID: salt-api failed Function: salt.state Name: just.exit Result: False Comment: No highstate or sls specified, no execution made Started: 22:30:53.628882 Duration: 0.647 ms Changes: Summary for salt_master ------------ Succeeded: 0 Failed: 1 ------------ Total states run: 1 Total run time: 0.647 ms salt:~ # !tail tail -f /var/log/salt/master 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-igw/cluster/igw*.sls matched no files 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-openattic/cluster/salt.sls matched no files 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/global.yml matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/ceph/cluster.yml matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] cluster/*.sls matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] stack/default/ceph/minions/*.yml matched no files 2018-09-18 22:29:08,822 [salt.state ][ERROR ][8499] No highstate or sls specified, no execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR ][5672] Exception occurred while handling stream: [Errno 0] Success 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No highstate or sls specified, no execution made 2018-09-18 22:30:53,629 [salt.state ][ERROR ][9272] No highstate or sls specified, no execution made There?s also some issue with the salt-minion.service: ? salt-minion.service - The Salt Minion Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-09-18 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion) ? ..... Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion. Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function cephimages.list in mine_functions failed to execute -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.zhou at suse.com Wed Sep 19 00:35:04 2018 From: joel.zhou at suse.com (Joel Zhou) Date: Wed, 19 Sep 2018 06:35:04 +0000 Subject: [Deepsea-users] stage 1 errors on Azure Message-ID: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> Hi Kevin, My short answer is, Step 1, before stage 0, check your salt-api service on salt-master node first. ```bash zypper install -y salt-api systemctl enable salt-api.service systemctl start salt-api.service ``` Step 2, make sure NTP service works correctly on all nodes, which means time synchronized correctly on all nodes. Step 3, reboot all your nodes, if acceptable. In case of kernel updated somehow. Step 4, then you have to start over again from stage 0 to 5. Basically, deepsea is a bunch of salt scripts, and salt based on python2 and/or python3. I have no clues about your whole running stack, so assume SLES 12 sp3 + SES 5, which works fine and supported. More info would be helpful, and also your purpose, such as for practice on your own, or for PoC/testing to meet customer?s demands. Regards, -- Joel Zhou ??? Senior Storage Technologist, APJ Mobile: +86 18514577601 Email: joel.zhou at suse.com From: on behalf of Kevin Ayres Reply-To: Discussions about the DeepSea management framework for Ceph Date: Tuesday, September 18, 2018 at 4:49 PM To: "deepsea-users at lists.suse.com" Subject: [Deepsea-users] stage 1 errors on Azure Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully. Same output with deepsea command. The master and minion service are running and bidirectional host resolution are good. Keys are all accepted. From what I can determine, the default files are not created by stage 0 for some reason. Thoughts? What I?m seeing is that it fails to create the /srv/pillar/ceph/proposals I?m running through this doc line by line: https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/book_storage_deployment/book_storage_deployment.html#deepsea.cli ~ Kevin salt:~ # salt-run state.orch ceph.stage.discovery salt-api : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"] deepsea_minions : valid master_minion : valid ceph_version : valid [ERROR ] No highstate or sls specified, no execution made salt_master: ---------- ID: salt-api failed Function: salt.state Name: just.exit Result: False Comment: No highstate or sls specified, no execution made Started: 22:30:53.628882 Duration: 0.647 ms Changes: Summary for salt_master ------------ Succeeded: 0 Failed: 1 ------------ Total states run: 1 Total run time: 0.647 ms salt:~ # !tail tail -f /var/log/salt/master 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-igw/cluster/igw*.sls matched no files 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-openattic/cluster/salt.sls matched no files 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/global.yml matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/ceph/cluster.yml matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] cluster/*.sls matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] stack/default/ceph/minions/*.yml matched no files 2018-09-18 22:29:08,822 [salt.state ][ERROR ][8499] No highstate or sls specified, no execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR ][5672] Exception occurred while handling stream: [Errno 0] Success 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No highstate or sls specified, no execution made 2018-09-18 22:30:53,629 [salt.state ][ERROR ][9272] No highstate or sls specified, no execution made There?s also some issue with the salt-minion.service: ? salt-minion.service - The Salt Minion Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-09-18 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion) ? ..... Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion. Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function cephimages.list in mine_functions failed to execute -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.ayres at suse.com Wed Sep 19 13:20:43 2018 From: kevin.ayres at suse.com (Kevin Ayres) Date: Wed, 19 Sep 2018 19:20:43 +0000 Subject: [Deepsea-users] stage 1 errors on Azure In-Reply-To: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> Message-ID: Thanks Joel, yes DNS, NTP is configured and behaving correctly. SP3/SES5 from current repo. salt-api service, master, minion service running (with one error.) I?m walking through the Deployment guide line by line with same result, now on my second freshly built master node. Salt output is at the bottom of this message. Key: After stage 0, the */proposals directory has NOT been created. Here?s my build on a single flat network(Azure vNet 172.19.20.0/24): Root ssh enabled and key based login from master to all nodes as root. All nodes rebooted before salt stage. All nodes using identical image and fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - the Azure instance defaults. Salt (and all nodes):~ # zypper lr -E Repository priorities are without effect. All enabled repositories share the same priority. # | Alias | Name | Enabled | GPG Check | Refresh ---+--------------------------------------------------------------------+-----------------------------------+---------+-----------+-------- 3 | SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool | SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 | SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates | SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 | SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool | SLES12-SP3-Pool | Yes | (r ) Yes | No 10 | SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates | SLES12-SP3-Updates | Yes | (r ) Yes | Yes **DNS** all nodes resolve bidirectionally. Azure cares for DNS but I?ve also updated hosts files. salt:~ # hostname salt salt:~ # ping salt PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1) 56(84) bytes of data. 64 bytes from salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1): icmp_seq=1 ttl=64 time=0.030 ms 104.211.27.224 Outside NAT to 172.19.20.10 salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt 172.19.20.12 mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1 172.19.20.13 mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon2 172.19.20.14 mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3 172.19.20.15 osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1 172.19.20.16 osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd2 172.19.20.17 osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3 172.19.20.18 igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1 **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in same GEO as Azure AZ: (US East) navobs1.gatech.edu as shown: bash-3.2$ pssh -h pssh-hosts -l sesuser -i sudo ntpq -p [1] 11:15:27 [SUCCESS] mon3 remote refid st t when poll reach delay offset jitter ============================================================================== *navobs1.gatech. .GPS. 1 u 19 64 1 15.596 -4.863 0.333 [2] 11:15:27 [SUCCESS] salt remote refid st t when poll reach delay offset jitter ============================================================================== navobs1.gatech. .GPS. 1 u 42 64 1 17.063 -6.702 0.000 [3] 11:15:27 [SUCCESS] igw1 remote refid st t when poll reach delay offset jitter ============================================================================== *navobs1.gatech. .GPS. 1 u 18 64 1 17.394 -27.874 7.663 [4] 11:15:27 [SUCCESS] osd1 remote refid st t when poll reach delay offset jitter ============================================================================== *navobs1.gatech. .GPS. 1 u 21 64 1 16.962 -3.755 0.813 [5] 11:15:27 [SUCCESS] osd2 remote refid st t when poll reach delay offset jitter ============================================================================== *navobs1.gatech. .GPS. 1 u 22 64 1 15.832 -4.709 3.062 [6] 11:15:27 [SUCCESS] osd3 remote refid st t when poll reach delay offset jitter ============================================================================== *navobs1.gatech. .GPS. 1 u 26 64 1 15.877 -3.252 19.131 [7] 11:15:27 [SUCCESS] mon1 remote refid st t when poll reach delay offset jitter ============================================================================== navobs1.gatech. .GPS. 1 u 2 64 1 16.120 -4.263 0.000 [8] 11:15:27 [SUCCESS] mon2 remote refid st t when poll reach delay offset jitter ============================================================================== navobs1.gatech. .GPS. 1 u 2 64 1 16.108 -7.713 0.959 **SALT** salt:~ # systemctl status salt-api salt-master salt-minion |grep 'active (running)' Active: active (running) since Wed 2018-09-19 18:14:39 UTC; 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC; 13min ago Active: active (running) since Wed 2018-09-19 18:14:41 UTC; 13min ago salt:~ # systemctl status salt-api salt-master salt-minion |grep ERROR Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function cephimages.list in mine_functions failed to execute salt:~ # salt-key --list-all Accepted Keys: igw1 mon1 mon2 mon3 osd1 osd2 osd3 salt Denied Keys: Unaccepted Keys: Rejected Keys: salt:~ # salt '*' test.ping salt: True osd2: True mon3: True osd3: True osd1: True mon2: True igw1: True mon1: True salt:~ # cat /srv/pillar/ceph/master_minion.sls master_minion: salt salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls ... # Choose all minions deepsea_minions: '*' ... **SALT STAGES** Stage 0 is successful with no errors but does not create the proposals folder. salt:~ # salt-run state.orch ceph.stage.prep deepsea_minions : valid master_minion : valid ceph_version : valid [WARNING ] All minions are ready salt_master: Name: sync master - Function: salt.state - Result: Changed Started: - 18:44:20.440255 Duration: 949.98 ms Name: salt-api - Function: salt.state - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms Name: repo master - Function: salt.state - Result: Clean Started: - 18:44:24.647227 Duration: 351.0 ms Name: metapackage master - Function: salt.state - Result: Clean Started: - 18:44:24.998333 Duration: 1127.063 ms Name: prepare master - Function: salt.state - Result: Changed Started: - 18:44:26.125514 Duration: 4109.917 ms Name: filequeue.remove - Function: salt.runner - Result: Changed Started: - 18:44:30.235610 Duration: 2071.199 ms Name: restart master - Function: salt.state - Result: Clean Started: - 18:44:32.306972 Duration: 1006.268 ms Name: filequeue.add - Function: salt.runner - Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms Name: minions.ready - Function: salt.runner - Result: Changed Started: - 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function: salt.state - Result: Clean Started: - 18:44:36.558363 Duration: 553.342 ms Name: metapackage minions - Function: salt.state - Result: Clean Started: - 18:44:37.111825 Duration: 3993.733 ms Name: common packages - Function: salt.state - Result: Clean Started: - 18:44:41.105706 Duration: 2434.079 ms Name: sync - Function: salt.state - Result: Changed Started: - 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function: salt.state - Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name: updates - Function: salt.state - Result: Changed Started: - 18:44:46.578853 Duration: 11183.347 ms Name: restart - Function: salt.state - Result: Clean Started: - 18:44:57.762346 Duration: 1553.957 ms Name: mds restart noop - Function: test.nop - Result: Clean Started: - 18:44:59.316442 Duration: 0.348 ms Summary for salt_master ------------- Succeeded: 17 (changed=8) Failed: 0 ------------- Total states run: 17 Total run time: 38.874 s Before running Stage 1, the /srv/pillar/ceph/proposals directory does not exist. salt:~ # ls /srv/pillar/ceph/proposals/ ls: cannot access '/srv/pillar/ceph/proposals/': No such file or directory That?s where I?m at ? Googling.. ~ Kevin From: on behalf of Joel Zhou Reply-To: Discussions about the DeepSea management framework for Ceph Date: Tuesday, September 18, 2018 at 11:34 PM To: Discussions about the DeepSea management framework for Ceph Subject: Re: [Deepsea-users] stage 1 errors on Azure Hi Kevin, My short answer is, Step 1, before stage 0, check your salt-api service on salt-master node first. ```bash zypper install -y salt-api systemctl enable salt-api.service systemctl start salt-api.service ``` Step 2, make sure NTP service works correctly on all nodes, which means time synchronized correctly on all nodes. Step 3, reboot all your nodes, if acceptable. In case of kernel updated somehow. Step 4, then you have to start over again from stage 0 to 5. Basically, deepsea is a bunch of salt scripts, and salt based on python2 and/or python3. I have no clues about your whole running stack, so assume SLES 12 sp3 + SES 5, which works fine and supported. More info would be helpful, and also your purpose, such as for practice on your own, or for PoC/testing to meet customer?s demands. Regards, -- Joel Zhou ??? Senior Storage Technologist, APJ Mobile: +86 18514577601 Email: joel.zhou at suse.com From: on behalf of Kevin Ayres Reply-To: Discussions about the DeepSea management framework for Ceph Date: Tuesday, September 18, 2018 at 4:49 PM To: "deepsea-users at lists.suse.com" Subject: [Deepsea-users] stage 1 errors on Azure Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully. Same output with deepsea command. The master and minion service are running and bidirectional host resolution are good. Keys are all accepted. From what I can determine, the default files are not created by stage 0 for some reason. Thoughts? What I?m seeing is that it fails to create the /srv/pillar/ceph/proposals I?m running through this doc line by line: https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/book_storage_deployment/book_storage_deployment.html#deepsea.cli ~ Kevin salt:~ # salt-run state.orch ceph.stage.discovery salt-api : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"] deepsea_minions : valid master_minion : valid ceph_version : valid [ERROR ] No highstate or sls specified, no execution made salt_master: ---------- ID: salt-api failed Function: salt.state Name: just.exit Result: False Comment: No highstate or sls specified, no execution made Started: 22:30:53.628882 Duration: 0.647 ms Changes: Summary for salt_master ------------ Succeeded: 0 Failed: 1 ------------ Total states run: 1 Total run time: 0.647 ms salt:~ # !tail tail -f /var/log/salt/master 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-igw/cluster/igw*.sls matched no files 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] role-openattic/cluster/salt.sls matched no files 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/global.yml matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] config/stack/default/ceph/cluster.yml matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] cluster/*.sls matched no files 2018-09-18 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] stack/default/ceph/minions/*.yml matched no files 2018-09-18 22:29:08,822 [salt.state ][ERROR ][8499] No highstate or sls specified, no execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR ][5672] Exception occurred while handling stream: [Errno 0] Success 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No highstate or sls specified, no execution made 2018-09-18 22:30:53,629 [salt.state ][ERROR ][9272] No highstate or sls specified, no execution made There?s also some issue with the salt-minion.service: ? salt-minion.service - The Salt Minion Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-09-18 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion) ? ..... Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion. Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function cephimages.list in mine_functions failed to execute -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejackson at suse.com Wed Sep 19 13:37:00 2018 From: ejackson at suse.com (Eric Jackson) Date: Wed, 19 Sep 2018 15:37:00 -0400 Subject: [Deepsea-users] stage 1 errors on Azure In-Reply-To: References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> Message-ID: <1624041.v2sRp242nD@fury.home> Hi Kevin, Stage 0 only does the "preparation" part. That is, sync'ing salt modules, zypper updates, etc. Stage 1 is the "discovery" part that interrogates the minions and then creates the roles and storage fragments. If your salt-api issue is resolved, Stage 1 should run relatively quick. Eric On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote: > Thanks Joel, yes DNS, NTP is configured and behaving correctly. SP3/SES5 > from current repo. salt-api service, master, minion service running (with > one error.) I?m walking through the Deployment guide line by line with > same result, now on my second freshly built master node. Salt output is at > the bottom of this message. Key: After stage 0, the */proposals directory > has NOT been created. > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24): > Root ssh enabled and key based login from master to all nodes as root. All > nodes rebooted before salt stage. All nodes using identical image and > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - the > Azure instance defaults. > Salt (and all nodes):~ # zypper lr -E > Repository priorities are without effect. All enabled repositories share the > same priority. # | Alias > | Name | Enabled | GPG Check | > Refresh > ---+--------------------------------------------------------------------+-- > ---------------------------------+---------+-----------+-------- 3 | > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool | > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 | > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates | > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 | > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool | > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 | > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates | > SLES12-SP3-Updates | Yes | (r ) Yes | Yes > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but I?ve also > updated hosts files. salt:~ # hostname > salt > salt:~ # ping salt > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1) > 56(84) bytes of data. 64 bytes from > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1): > icmp_seq=1 ttl=64 time=0.030 ms > 104.211.27.224 Outside NAT to 172.19.20.10 > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt 172.19.20.12 > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1 > 172.19.20.13 mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > mon2 172.19.20.14 > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3 172.19.20.15 > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1 > 172.19.20.16 osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > osd2 172.19.20.17 > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3 172.19.20.18 > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1 > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in same > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown: bash-3.2$ pssh -h > pssh-hosts -l sesuser -i sudo ntpq -p > [1] 11:15:27 [SUCCESS] mon3 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 19 64 1 15.596 -4.863 > 0.333 [2] 11:15:27 [SUCCESS] salt > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === navobs1.gatech. .GPS. 1 u 42 64 1 17.063 -6.702 > 0.000 [3] 11:15:27 [SUCCESS] igw1 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 18 64 1 17.394 -27.874 > 7.663 [4] 11:15:27 [SUCCESS] osd1 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 21 64 1 16.962 -3.755 > 0.813 [5] 11:15:27 [SUCCESS] osd2 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 22 64 1 15.832 -4.709 > 3.062 [6] 11:15:27 [SUCCESS] osd3 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 26 64 1 15.877 -3.252 > 19.131 [7] 11:15:27 [SUCCESS] mon1 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === navobs1.gatech. .GPS. 1 u 2 64 1 16.120 -4.263 > 0.000 [8] 11:15:27 [SUCCESS] mon2 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === navobs1.gatech. .GPS. 1 u 2 64 1 16.108 -7.713 > 0.959 > > **SALT** > salt:~ # systemctl status salt-api salt-master salt-minion |grep 'active > (running)' Active: active (running) since Wed 2018-09-19 18:14:39 UTC; > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC; 13min > ago Active: active (running) since Wed 2018-09-19 18:14:41 UTC; 13min ago > salt:~ # systemctl status salt-api salt-master salt-minion |grep ERROR > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function > cephimages.list in mine_functions failed to execute > salt:~ # salt-key --list-all > Accepted Keys: > igw1 > mon1 > mon2 > mon3 > osd1 > osd2 > osd3 > salt > Denied Keys: > Unaccepted Keys: > Rejected Keys: > > salt:~ # salt '*' test.ping > salt: > True > osd2: > True > mon3: > True > osd3: > True > osd1: > True > mon2: > True > igw1: > True > mon1: > True > > salt:~ # cat /srv/pillar/ceph/master_minion.sls > master_minion: salt > > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls > ... > # Choose all minions > deepsea_minions: '*' > ... > > **SALT STAGES** > Stage 0 is successful with no errors but does not create the proposals > folder. > salt:~ # salt-run state.orch ceph.stage.prep > deepsea_minions : valid > master_minion : valid > ceph_version : valid > [WARNING ] All minions are ready > salt_master: > Name: sync master - Function: salt.state - Result: Changed Started: - > 18:44:20.440255 Duration: 949.98 ms Name: salt-api - Function: salt.state > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms Name: > repo master - Function: salt.state - Result: Clean Started: - > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master - Function: > salt.state - Result: Clean Started: - 18:44:24.998333 Duration: 1127.063 ms > Name: prepare master - Function: salt.state - Result: Changed Started: - > 18:44:26.125514 Duration: 4109.917 ms Name: filequeue.remove - Function: > salt.runner - Result: Changed Started: - 18:44:30.235610 Duration: 2071.199 > ms Name: restart master - Function: salt.state - Result: Clean Started: - > 18:44:32.306972 Duration: 1006.268 ms Name: filequeue.add - Function: > salt.runner - Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 > ms Name: minions.ready - Function: salt.runner - Result: Changed Started: - > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function: salt.state - > Result: Clean Started: - 18:44:36.558363 Duration: 553.342 ms Name: > metapackage minions - Function: salt.state - Result: Clean Started: - > 18:44:37.111825 Duration: 3993.733 ms Name: common packages - Function: > salt.state - Result: Clean Started: - 18:44:41.105706 Duration: 2434.079 ms > Name: sync - Function: salt.state - Result: Changed Started: - > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function: salt.state - > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name: > updates - Function: salt.state - Result: Changed Started: - 18:44:46.578853 > Duration: 11183.347 ms Name: restart - Function: salt.state - Result: Clean > Started: - 18:44:57.762346 Duration: 1553.957 ms Name: mds restart noop - > Function: test.nop - Result: Clean Started: - 18:44:59.316442 Duration: > 0.348 ms > Summary for salt_master > ------------- > Succeeded: 17 (changed=8) > Failed: 0 > ------------- > Total states run: 17 > Total run time: 38.874 s > > > Before running Stage 1, the /srv/pillar/ceph/proposals directory does not > exist. salt:~ # ls /srv/pillar/ceph/proposals/ > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or > directory > That?s where I?m at ? Googling.. > > ~ Kevin > > From: on behalf of Joel Zhou > Reply-To: Discussions about the DeepSea management > framework for Ceph Date: Tuesday, September > 18, 2018 at 11:34 PM > To: Discussions about the DeepSea management framework for Ceph > Subject: Re: [Deepsea-users] stage 1 errors > on Azure > > Hi Kevin, > > My short answer is, > > Step 1, before stage 0, check your salt-api service on salt-master node > first. ```bash > zypper install -y salt-api > systemctl enable salt-api.service > systemctl start salt-api.service > ``` > Step 2, make sure NTP service works correctly on all nodes, which means time > synchronized correctly on all nodes. Step 3, reboot all your nodes, if > acceptable. In case of kernel updated somehow. Step 4, then you have to > start over again from stage 0 to 5. > > Basically, deepsea is a bunch of salt scripts, and salt based on python2 > and/or python3. I have no clues about your whole running stack, so assume > SLES 12 sp3 + SES 5, which works fine and supported. More info would be > helpful, and also your purpose, such as for practice on your own, or for > PoC/testing to meet customer?s demands. > Regards, > > -- > Joel Zhou ??? > Senior Storage Technologist, APJ > > Mobile: +86 18514577601 > Email: joel.zhou at suse.com > > From: on behalf of Kevin Ayres > Reply-To: Discussions about the DeepSea management > framework for Ceph Date: Tuesday, September > 18, 2018 at 4:49 PM > To: "deepsea-users at lists.suse.com" > Subject: [Deepsea-users] stage 1 errors on Azure > > Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully. > Same output with deepsea command. The master and minion service are running > and bidirectional host resolution are good. Keys are all accepted. From > what I can determine, the default files are not created by stage 0 for some > reason. Thoughts? What I?m seeing is that it fails to create the > /srv/pillar/ceph/proposals > I?m running through this doc line by line: > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/boo > k_storage_deployment/book_storage_deployment.html#deepsea.cli > ~ Kevin > > > salt:~ # salt-run state.orch ceph.stage.discovery > > salt-api : ["Salt API is failing to authenticate - try > 'systemctl restart salt-master': list index out of range"] > deepsea_minions : valid > > master_minion : valid > > ceph_version : valid > > [ERROR ] No highstate or sls specified, no execution made > > salt_master: > > ---------- > > ID: salt-api failed > > Function: salt.state > > Name: just.exit > > Result: False > > Comment: No highstate or sls specified, no execution made > > Started: 22:30:53.628882 > > Duration: 0.647 ms > > Changes: > > > > Summary for salt_master > > ------------ > > Succeeded: 0 > > Failed: 1 > > ------------ > > Total states run: 1 > > Total run time: 0.647 ms > > salt:~ # !tail > tail -f /var/log/salt/master > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] > role-igw/cluster/igw*.sls matched no files 2018-09-18 22:29:08,797 > [salt.loaded.ext.runners.validate][WARNING ][8499] > role-openattic/cluster/salt.sls matched no files 2018-09-18 22:29:08,797 > [salt.loaded.ext.runners.validate][WARNING ][8499] > config/stack/default/global.yml matched no files 2018-09-18 22:29:08,798 > [salt.loaded.ext.runners.validate][WARNING ][8499] > config/stack/default/ceph/cluster.yml matched no files 2018-09-18 > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] > cluster/*.sls matched no files 2018-09-18 22:29:08,798 > [salt.loaded.ext.runners.validate][WARNING ][8499] > stack/default/ceph/minions/*.yml matched no files 2018-09-18 22:29:08,822 > [salt.state ][ERROR ][8499] No highstate or sls specified, no > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR ][5672] > Exception occurred while handling stream: [Errno 0] Success 2018-09-18 > 22:29:56,797 [salt.state ][ERROR ][8759] No highstate or sls > specified, no execution made 2018-09-18 22:30:53,629 [salt.state > ][ERROR ][9272] No highstate or sls specified, no execution made > > There?s also some issue with the salt-minion.service: > ? salt-minion.service - The Salt Minion > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; > vendor preset: disabled) Active: active (running) since Tue 2018-09-18 > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion) > ? > ..... > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion. > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function cephimages.list > in mine_functions failed to execute > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From kevin.ayres at suse.com Wed Sep 19 13:56:17 2018 From: kevin.ayres at suse.com (Kevin Ayres) Date: Wed, 19 Sep 2018 19:56:17 +0000 Subject: [Deepsea-users] stage 1 errors on Azure In-Reply-To: <1624041.v2sRp242nD@fury.home> References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> <1624041.v2sRp242nD@fury.home> Message-ID: <33994725-E346-4599-842A-042E5DBFA138@suse.com> Thanks Eric, Yes, I understand this but worded it poorly. I don't see any issues with NTP or DNS. Something else is amiss. Should deepsea be installed after salt as outlined in the deployment doc, or before? salt:~ # salt-run state.orch ceph.stage.discovery salt-api : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"] deepsea_minions : valid master_minion : valid ceph_version : valid [ERROR ] No highstate or sls specified, no execution made salt_master: ---------- ID: salt-api failed Function: salt.state Name: just.exit Result: False Comment: No highstate or sls specified, no execution made Started: 19:38:41.962044 Duration: 0.734 ms Changes: Summary for salt_master ------------ Succeeded: 0 Failed: 1 ------------ Total states run: 1 Total run time: 0.734 ms salt:~ # tail -f /var/log/salt/master 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319] All minions are ready 2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR ][1626] Exception occurred while handling stream: [Errno 0] Success 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate or sls specified, no execution made salt:~ # ls /srv/pillar/ceph/proposals ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory salt:~ # ls /srv/pillar/ceph/ benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave init.sls master_minion.sls master_minion.sls.rpmsave stack ~ Kevin ?On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of Eric Jackson" wrote: Hi Kevin, Stage 0 only does the "preparation" part. That is, sync'ing salt modules, zypper updates, etc. Stage 1 is the "discovery" part that interrogates the minions and then creates the roles and storage fragments. If your salt-api issue is resolved, Stage 1 should run relatively quick. Eric On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote: > Thanks Joel, yes DNS, NTP is configured and behaving correctly. SP3/SES5 > from current repo. salt-api service, master, minion service running (with > one error.) I?m walking through the Deployment guide line by line with > same result, now on my second freshly built master node. Salt output is at > the bottom of this message. Key: After stage 0, the */proposals directory > has NOT been created. > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24): > Root ssh enabled and key based login from master to all nodes as root. All > nodes rebooted before salt stage. All nodes using identical image and > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - the > Azure instance defaults. > Salt (and all nodes):~ # zypper lr -E > Repository priorities are without effect. All enabled repositories share the > same priority. # | Alias > | Name | Enabled | GPG Check | > Refresh > ---+--------------------------------------------------------------------+-- > ---------------------------------+---------+-----------+-------- 3 | > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool | > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 | > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates | > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 | > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool | > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 | > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates | > SLES12-SP3-Updates | Yes | (r ) Yes | Yes > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but I?ve also > updated hosts files. salt:~ # hostname > salt > salt:~ # ping salt > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1) > 56(84) bytes of data. 64 bytes from > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1): > icmp_seq=1 ttl=64 time=0.030 ms > 104.211.27.224 Outside NAT to 172.19.20.10 > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt 172.19.20.12 > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1 > 172.19.20.13 mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > mon2 172.19.20.14 > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3 172.19.20.15 > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1 > 172.19.20.16 osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > osd2 172.19.20.17 > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3 172.19.20.18 > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1 > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in same > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown: bash-3.2$ pssh -h > pssh-hosts -l sesuser -i sudo ntpq -p > [1] 11:15:27 [SUCCESS] mon3 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 19 64 1 15.596 -4.863 > 0.333 [2] 11:15:27 [SUCCESS] salt > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === navobs1.gatech. .GPS. 1 u 42 64 1 17.063 -6.702 > 0.000 [3] 11:15:27 [SUCCESS] igw1 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 18 64 1 17.394 -27.874 > 7.663 [4] 11:15:27 [SUCCESS] osd1 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 21 64 1 16.962 -3.755 > 0.813 [5] 11:15:27 [SUCCESS] osd2 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 22 64 1 15.832 -4.709 > 3.062 [6] 11:15:27 [SUCCESS] osd3 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === *navobs1.gatech. .GPS. 1 u 26 64 1 15.877 -3.252 > 19.131 [7] 11:15:27 [SUCCESS] mon1 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === navobs1.gatech. .GPS. 1 u 2 64 1 16.120 -4.263 > 0.000 [8] 11:15:27 [SUCCESS] mon2 > remote refid st t when poll reach delay offset > jitter > =========================================================================== > === navobs1.gatech. .GPS. 1 u 2 64 1 16.108 -7.713 > 0.959 > > **SALT** > salt:~ # systemctl status salt-api salt-master salt-minion |grep 'active > (running)' Active: active (running) since Wed 2018-09-19 18:14:39 UTC; > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC; 13min > ago Active: active (running) since Wed 2018-09-19 18:14:41 UTC; 13min ago > salt:~ # systemctl status salt-api salt-master salt-minion |grep ERROR > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function > cephimages.list in mine_functions failed to execute > salt:~ # salt-key --list-all > Accepted Keys: > igw1 > mon1 > mon2 > mon3 > osd1 > osd2 > osd3 > salt > Denied Keys: > Unaccepted Keys: > Rejected Keys: > > salt:~ # salt '*' test.ping > salt: > True > osd2: > True > mon3: > True > osd3: > True > osd1: > True > mon2: > True > igw1: > True > mon1: > True > > salt:~ # cat /srv/pillar/ceph/master_minion.sls > master_minion: salt > > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls > ... > # Choose all minions > deepsea_minions: '*' > ... > > **SALT STAGES** > Stage 0 is successful with no errors but does not create the proposals > folder. > salt:~ # salt-run state.orch ceph.stage.prep > deepsea_minions : valid > master_minion : valid > ceph_version : valid > [WARNING ] All minions are ready > salt_master: > Name: sync master - Function: salt.state - Result: Changed Started: - > 18:44:20.440255 Duration: 949.98 ms Name: salt-api - Function: salt.state > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms Name: > repo master - Function: salt.state - Result: Clean Started: - > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master - Function: > salt.state - Result: Clean Started: - 18:44:24.998333 Duration: 1127.063 ms > Name: prepare master - Function: salt.state - Result: Changed Started: - > 18:44:26.125514 Duration: 4109.917 ms Name: filequeue.remove - Function: > salt.runner - Result: Changed Started: - 18:44:30.235610 Duration: 2071.199 > ms Name: restart master - Function: salt.state - Result: Clean Started: - > 18:44:32.306972 Duration: 1006.268 ms Name: filequeue.add - Function: > salt.runner - Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 > ms Name: minions.ready - Function: salt.runner - Result: Changed Started: - > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function: salt.state - > Result: Clean Started: - 18:44:36.558363 Duration: 553.342 ms Name: > metapackage minions - Function: salt.state - Result: Clean Started: - > 18:44:37.111825 Duration: 3993.733 ms Name: common packages - Function: > salt.state - Result: Clean Started: - 18:44:41.105706 Duration: 2434.079 ms > Name: sync - Function: salt.state - Result: Changed Started: - > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function: salt.state - > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name: > updates - Function: salt.state - Result: Changed Started: - 18:44:46.578853 > Duration: 11183.347 ms Name: restart - Function: salt.state - Result: Clean > Started: - 18:44:57.762346 Duration: 1553.957 ms Name: mds restart noop - > Function: test.nop - Result: Clean Started: - 18:44:59.316442 Duration: > 0.348 ms > Summary for salt_master > ------------- > Succeeded: 17 (changed=8) > Failed: 0 > ------------- > Total states run: 17 > Total run time: 38.874 s > > > Before running Stage 1, the /srv/pillar/ceph/proposals directory does not > exist. salt:~ # ls /srv/pillar/ceph/proposals/ > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or > directory > That?s where I?m at ? Googling.. > > ~ Kevin > > From: on behalf of Joel Zhou > Reply-To: Discussions about the DeepSea management > framework for Ceph Date: Tuesday, September > 18, 2018 at 11:34 PM > To: Discussions about the DeepSea management framework for Ceph > Subject: Re: [Deepsea-users] stage 1 errors > on Azure > > Hi Kevin, > > My short answer is, > > Step 1, before stage 0, check your salt-api service on salt-master node > first. ```bash > zypper install -y salt-api > systemctl enable salt-api.service > systemctl start salt-api.service > ``` > Step 2, make sure NTP service works correctly on all nodes, which means time > synchronized correctly on all nodes. Step 3, reboot all your nodes, if > acceptable. In case of kernel updated somehow. Step 4, then you have to > start over again from stage 0 to 5. > > Basically, deepsea is a bunch of salt scripts, and salt based on python2 > and/or python3. I have no clues about your whole running stack, so assume > SLES 12 sp3 + SES 5, which works fine and supported. More info would be > helpful, and also your purpose, such as for practice on your own, or for > PoC/testing to meet customer?s demands. > Regards, > > -- > Joel Zhou ??? > Senior Storage Technologist, APJ > > Mobile: +86 18514577601 > Email: joel.zhou at suse.com > > From: on behalf of Kevin Ayres > Reply-To: Discussions about the DeepSea management > framework for Ceph Date: Tuesday, September > 18, 2018 at 4:49 PM > To: "deepsea-users at lists.suse.com" > Subject: [Deepsea-users] stage 1 errors on Azure > > Hey guys, I can?t seem to get past stage 1. Stage 0 complete successfully. > Same output with deepsea command. The master and minion service are running > and bidirectional host resolution are good. Keys are all accepted. From > what I can determine, the default files are not created by stage 0 for some > reason. Thoughts? What I?m seeing is that it fails to create the > /srv/pillar/ceph/proposals > I?m running through this doc line by line: > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/boo > k_storage_deployment/book_storage_deployment.html#deepsea.cli > ~ Kevin > > > salt:~ # salt-run state.orch ceph.stage.discovery > > salt-api : ["Salt API is failing to authenticate - try > 'systemctl restart salt-master': list index out of range"] > deepsea_minions : valid > > master_minion : valid > > ceph_version : valid > > [ERROR ] No highstate or sls specified, no execution made > > salt_master: > > ---------- > > ID: salt-api failed > > Function: salt.state > > Name: just.exit > > Result: False > > Comment: No highstate or sls specified, no execution made > > Started: 22:30:53.628882 > > Duration: 0.647 ms > > Changes: > > > > Summary for salt_master > > ------------ > > Succeeded: 0 > > Failed: 1 > > ------------ > > Total states run: 1 > > Total run time: 0.647 ms > > salt:~ # !tail > tail -f /var/log/salt/master > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING ][8499] > role-igw/cluster/igw*.sls matched no files 2018-09-18 22:29:08,797 > [salt.loaded.ext.runners.validate][WARNING ][8499] > role-openattic/cluster/salt.sls matched no files 2018-09-18 22:29:08,797 > [salt.loaded.ext.runners.validate][WARNING ][8499] > config/stack/default/global.yml matched no files 2018-09-18 22:29:08,798 > [salt.loaded.ext.runners.validate][WARNING ][8499] > config/stack/default/ceph/cluster.yml matched no files 2018-09-18 > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] > cluster/*.sls matched no files 2018-09-18 22:29:08,798 > [salt.loaded.ext.runners.validate][WARNING ][8499] > stack/default/ceph/minions/*.yml matched no files 2018-09-18 22:29:08,822 > [salt.state ][ERROR ][8499] No highstate or sls specified, no > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR ][5672] > Exception occurred while handling stream: [Errno 0] Success 2018-09-18 > 22:29:56,797 [salt.state ][ERROR ][8759] No highstate or sls > specified, no execution made 2018-09-18 22:30:53,629 [salt.state > ][ERROR ][9272] No highstate or sls specified, no execution made > > There?s also some issue with the salt-minion.service: > ? salt-minion.service - The Salt Minion > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; > vendor preset: disabled) Active: active (running) since Tue 2018-09-18 > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion) > ? > ..... > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion. > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function cephimages.list > in mine_functions failed to execute > From ejackson at suse.com Wed Sep 19 14:39:28 2018 From: ejackson at suse.com (Eric Jackson) Date: Wed, 19 Sep 2018 16:39:28 -0400 Subject: [Deepsea-users] stage 1 errors on Azure In-Reply-To: <33994725-E346-4599-842A-042E5DBFA138@suse.com> References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> <1624041.v2sRp242nD@fury.home> <33994725-E346-4599-842A-042E5DBFA138@suse.com> Message-ID: <7625052.XIqpMaYcJK@fury.home> The rpm would be installed after Salt is configured. I understand some installations install both Salt and DeepSea via YaST. We did try to make accommodations for that scenario. So, your salt-api is still down. We ran into drastically different reasons why the salt-api can fail. We did not have the bandwidth to address each type of failure. The check we do is a curl command to verify that the salt-api is answering. curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=xxx -d eauth=sharedsecret The sharedsecret is in /etc/salt/master/sharedsecret.conf. It's possible to have the Salt master remember the previous contents of that file and the Salt api to use the current contents if an admin does things just right :) . That's why we give the error message about restarting the Salt master. However, if the above curl command fails because the Salt-api is down or maybe localhost is not defined in /etc/hosts (ran into that once). The curl command may shed more light on the failure. *** As far as how would you have found that curl command. Take a look at the contents of / srv/modules/runners/validate.py. About line 626, you will see the python code is literally calling the curl command. In other words, do not be intimidated about the python code. Much of it is calling some of the same command line tools that you would use. We are just using Salt to do much of this in parallel. It's also possible to run this directly without invoking Stage 1. # salt-run validate.saltapi Although the validations can be frustrating, not having them is worse. The situations where we did not check for Salt api lead to incredibly painful debug sessions. Eric On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote: > Thanks Eric, Yes, I understand this but worded it poorly. I don't see any > issues with NTP or DNS. Something else is amiss. Should deepsea be > installed after salt as outlined in the deployment doc, or before? > salt:~ # salt-run state.orch ceph.stage.discovery > salt-api : ["Salt API is failing to authenticate - try > 'systemctl restart salt-master': list index out of range"] deepsea_minions > : valid > master_minion : valid > ceph_version : valid > [ERROR ] No highstate or sls specified, no execution made > salt_master: > ---------- > ID: salt-api failed > Function: salt.state > Name: just.exit > Result: False > Comment: No highstate or sls specified, no execution made > Started: 19:38:41.962044 > Duration: 0.734 ms > Changes: > > Summary for salt_master > ------------ > Succeeded: 0 > Failed: 1 > ------------ > Total states run: 1 > Total run time: 0.734 ms > > > salt:~ # tail -f /var/log/salt/master > 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319] > All minions are ready 2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR > ][1626] Exception occurred while handling stream: [Errno 0] Success > 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate > or sls specified, no execution made > salt:~ # ls /srv/pillar/ceph/proposals > ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory > > salt:~ # ls /srv/pillar/ceph/ > benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave > init.sls master_minion.sls master_minion.sls.rpmsave stack > > ~ Kevin > > ?On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From kevin.ayres at suse.com Wed Sep 19 16:03:05 2018 From: kevin.ayres at suse.com (Kevin Ayres) Date: Wed, 19 Sep 2018 22:03:05 +0000 Subject: [Deepsea-users] stage 1 errors on Azure In-Reply-To: <7625052.XIqpMaYcJK@fury.home> References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> <1624041.v2sRp242nD@fury.home> <33994725-E346-4599-842A-042E5DBFA138@suse.com> <7625052.XIqpMaYcJK@fury.home> Message-ID: Thanks Eric! I understand. Couldn?t find localhost ? DOH moment. Salt-master restarted fine each time without errors but api was failing. salt:~ # salt-run validate.saltapi salt-api : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"] False Localhost was missing due to the heavy /etc/hosts modifications I made for Azure instance resolution. I just appended localhost into ?127.0.0.1 salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt? salt:~ # salt-run validate.saltapi salt-api : valid salt:~ # curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=fbc39dd2-2bba-42ec-ab9b-7d9e71b84047 -d eauth=sharedsecret HTTP/1.1 200 OK Content-Length: 204 Access-Control-Expose-Headers: GET, POST Vary: Accept-Encoding Server: CherryPy/3.6.0 Allow: GET, HEAD, POST Access-Control-Allow-Credentials: true Date: Wed, 19 Sep 2018 21:39:28 GMT Access-Control-Allow-Origin: * X-Auth-Token: 640bb306bd8fb202ef71757aac83f0db9beb4e11 Content-Type: application/json Set-Cookie: session_id=640bb306bd8fb202ef71757aac83f0db9beb4e11; expires=Thu, 20 Sep 2018 07:39:28 GMT; Path=/ {"return": [{"perms": [".*", "@runner", "@wheel"], "start": 1537393168.443645, "token": "640bb306bd8fb202ef71757aac83f0db9beb4e11", "expire": 1537436368.443646, "user": "admin", "eauth": "sharedsecret"}]}salt:~ # Now Stage 1 runs through. salt:~ # salt-run state.orch ceph.stage.discovery salt-api : valid deepsea_minions : valid master_minion : valid ceph_version : valid [WARNING ] All minions are ready {} salt_master: Name: minions.ready - Function: salt.runner - Result: Changed Started: - 21:59:25.186357 Duration: 1528.309 ms Name: refresh_pillar0 - Function: salt.state - Result: Changed Started: - 21:59:26.714801 Duration: 340.017 ms Name: populate.proposals - Function: salt.runner - Result: Changed Started: - 21:59:27.055255 Duration: 5107.852 ms Name: proposal.populate - Function: salt.runner - Result: Changed Started: - 21:59:32.163281 Duration: 2578.835 ms Summary for salt_master ------------ Succeeded: 4 (changed=4) Failed: 0 ------------ Total states run: 4 Total run time: 9.555 s And Proposals exist. salt:~ # ls /srv/pillar/ceph/proposals/ cluster-ceph config role-admin role-client-cephfs role-client-nfs role-ganesha role-master role-mgr role-openattic cluster-unassigned profile-default role-benchmark-rbd role-client-iscsi role-client-radosgw role-igw role-mds role-mon role-rgw Thanks again Eric! ~ Kevin From: on behalf of Eric Jackson Reply-To: Discussions about the DeepSea management framework for Ceph Date: Wednesday, September 19, 2018 at 1:39 PM To: "deepsea-users at lists.suse.com" Subject: Re: [Deepsea-users] stage 1 errors on Azure The rpm would be installed after Salt is configured. I understand some installations install both Salt and DeepSea via YaST. We did try to make accommodations for that scenario. So, your salt-api is still down. We ran into drastically different reasons why the salt-api can fail. We did not have the bandwidth to address each type of failure. The check we do is a curl command to verify that the salt-api is answering. curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=xxx -d eauth=sharedsecret The sharedsecret is in /etc/salt/master/sharedsecret.conf. It's possible to have the Salt master remember the previous contents of that file and the Salt api to use the current contents if an admin does things just right :) . That's why we give the error message about restarting the Salt master. However, if the above curl command fails because the Salt-api is down or maybe localhost is not defined in /etc/hosts (ran into that once). The curl command may shed more light on the failure. *** As far as how would you have found that curl command. Take a look at the contents of /srv/modules/runners/validate.py. About line 626, you will see the python code is literally calling the curl command. In other words, do not be intimidated about the python code. Much of it is calling some of the same command line tools that you would use. We are just using Salt to do much of this in parallel. It's also possible to run this directly without invoking Stage 1. # salt-run validate.saltapi Although the validations can be frustrating, not having them is worse. The situations where we did not check for Salt api lead to incredibly painful debug sessions. Eric On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote: > Thanks Eric, Yes, I understand this but worded it poorly. I don't see any > issues with NTP or DNS. Something else is amiss. Should deepsea be > installed after salt as outlined in the deployment doc, or before? > salt:~ # salt-run state.orch ceph.stage.discovery > salt-api : ["Salt API is failing to authenticate - try > 'systemctl restart salt-master': list index out of range"] deepsea_minions > : valid > master_minion : valid > ceph_version : valid > [ERROR ] No highstate or sls specified, no execution made > salt_master: > ---------- > ID: salt-api failed > Function: salt.state > Name: just.exit > Result: False > Comment: No highstate or sls specified, no execution made > Started: 19:38:41.962044 > Duration: 0.734 ms > Changes: > > Summary for salt_master > ------------ > Succeeded: 0 > Failed: 1 > ------------ > Total states run: 1 > Total run time: 0.734 ms > > > salt:~ # tail -f /var/log/salt/master > 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319] > All minions are ready 2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR > ][1626] Exception occurred while handling stream: [Errno 0] Success > 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate > or sls specified, no execution made > salt:~ # ls /srv/pillar/ceph/proposals > ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory > > salt:~ # ls /srv/pillar/ceph/ > benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave > init.sls master_minion.sls master_minion.sls.rpmsave stack > > ~ Kevin > > On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of > Eric Jackson" ejackson at suse.com> wrote: > Hi Kevin, > Stage 0 only does the "preparation" part. That is, sync'ing salt > modules, zypper updates, etc. Stage 1 is the "discovery" part that > interrogates the minions and then creates the roles and storage fragments. > If your salt-api issue is resolved, Stage 1 should run relatively quick. > > Eric > > On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote: > > > Thanks Joel, yes DNS, NTP is configured and behaving correctly. > > SP3/SES5 > > from current repo. salt-api service, master, minion service running > > (with > > one error.) > > I?m walking through the Deployment guide line by line with > > > same result, now on my second freshly built master node. Salt output > > is at > > the bottom of this message. Key: After stage 0, the */proposals > > directory > > has NOT been created. > > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24): > > Root ssh enabled and key based login from master to all nodes as root. > > All > > nodes rebooted before salt stage. > > All nodes using identical image and > > > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - > > the > > Azure instance defaults. > > Salt (and all nodes):~ # zypper lr -E > > Repository priorities are without effect. All enabled repositories > > share the same priority. > > # | Alias > > > | Name | Enabled | GPG Check > > | | > > > > Refresh > > ---+------------------------------------------------------------------ > > --+-- > > ---------------------------------+---------+-----------+-------- 3 | > > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool | > > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 | > > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates | > > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 | > > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool | > > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 | > > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates | > > SLES12-SP3-Updates | Yes | (r ) Yes | Yes > > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but > > I?ve also updated hosts files. > > salt:~ # hostname > > > salt > > salt:~ # ping salt > > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > (127.0.0.1) > > 56(84) bytes of data. > > 64 bytes from > > > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1): > > icmp_seq=1 ttl=64 time=0.030 ms > > 104.211.27.224 Outside NAT to 172.19.20.10 > > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt > > 172.19.20.12 > > > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1 > > > > 172.19.20.13 > > mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > mon2 172.19.20.14 > > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3 > > 172.19.20.15 > > > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1 > > > > 172.19.20.16 > > osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > osd2 172.19.20.17 > > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3 > > 172.19.20.18 > > > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1 > > > > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in > > same > > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown: > > bash-3.2$ pssh -h > > > pssh-hosts -l sesuser -i sudo ntpq -p > > [1] 11:15:27 [SUCCESS] mon3 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 19 64 1 > > 15.596 -4.863 0.333 [2] 11:15:27 [SUCCESS] salt > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 42 64 1 > > 17.063 -6.702 0.000 [3] 11:15:27 [SUCCESS] igw1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 18 64 1 > > 17.394 -27.874 7.663 [4] 11:15:27 [SUCCESS] osd1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 21 64 1 > > 16.962 -3.755 0.813 [5] 11:15:27 [SUCCESS] osd2 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 22 64 1 > > 15.832 -4.709 3.062 [6] 11:15:27 [SUCCESS] osd3 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 26 64 1 > > 15.877 -3.252 19.131 [7] 11:15:27 [SUCCESS] mon1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 2 64 1 > > 16.120 -4.263 0.000 [8] 11:15:27 [SUCCESS] mon2 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 2 64 1 > > 16.108 -7.713 0.959 > > > > **SALT** > > salt:~ # systemctl status salt-api salt-master salt-minion |grep > > 'active > > (running)' > > Active: active (running) since Wed 2018-09-19 18:14:39 UTC; > > > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC; > > 13min ago Active: active (running) since Wed 2018-09-19 18:14:41 > > UTC; 13min ago salt:~ # systemctl status salt-api salt-master > > salt-minion |grep ERROR > > > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function > > > > cephimages.list in mine_functions failed to execute > > > > > salt:~ # salt-key --list-all > > > > Accepted Keys: > > igw1 > > mon1 > > mon2 > > mon3 > > osd1 > > osd2 > > osd3 > > salt > > Denied Keys: > > Unaccepted Keys: > > Rejected Keys: > > > > > > salt:~ # salt '*' test.ping > > salt: > > > > True > > > > osd2: > > > > True > > > > mon3: > > > > True > > > > osd3: > > > > True > > > > osd1: > > > > True > > > > mon2: > > > > True > > > > igw1: > > > > True > > > > mon1: > > > > True > > > > > > salt:~ # cat /srv/pillar/ceph/master_minion.sls > > master_minion: salt > > > > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls > > ... > > # Choose all minions > > deepsea_minions: '*' > > ... > > > > **SALT STAGES** > > Stage 0 is successful with no errors but does not create the > > proposals > > folder. > > > > > salt:~ # salt-run state.orch ceph.stage.prep > > > > deepsea_minions : valid > > master_minion : valid > > ceph_version : valid > > [WARNING ] All minions are ready > > salt_master: > > > > Name: sync master - Function: salt.state - Result: Changed > > Started: - > > > > 18:44:20.440255 Duration: 949.98 ms > > Name: salt-api - Function: salt.state > > > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms > > Name: > > repo master - Function: salt.state - Result: Clean Started: - > > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master - > > Function: > > salt.state - Result: Clean Started: - 18:44:24.998333 Duration: > > 1127.063 ms Name: prepare master - Function: salt.state - Result: > > Changed Started: - 18:44:26.125514 Duration: 4109.917 ms Name: > > filequeue.remove - Function: salt.runner - Result: Changed Started: - > > 18:44:30.235610 Duration: 2071.199 ms Name: restart master - > > Function: salt.state - Result: Clean Started: - 18:44:32.306972 > > Duration: 1006.268 ms Name: filequeue.add - Function: salt.runner - > > Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms Name: > > minions.ready - Function: salt.runner - Result: Changed Started: - > > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function: > > salt.state - Result: Clean Started: - 18:44:36.558363 Duration: > > 553.342 ms Name: metapackage minions - Function: salt.state - Result: > > Clean Started: - 18:44:37.111825 Duration: 3993.733 ms Name: common > > packages - Function: salt.state - Result: Clean Started: - > > 18:44:41.105706 Duration: 2434.079 ms Name: sync - Function: > > salt.state - Result: Changed Started: - > > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function: > > salt.state - > > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name: > > updates - Function: salt.state - Result: Changed Started: - > > 18:44:46.578853 Duration: 11183.347 ms Name: restart - Function: > > salt.state - Result: Clean Started: - 18:44:57.762346 Duration: > > 1553.957 ms Name: mds restart noop - Function: test.nop - Result: > > Clean Started: - 18:44:59.316442 Duration: 0.348 ms > > > > Summary for salt_master > > ------------- > > Succeeded: 17 (changed=8) > > Failed: 0 > > ------------- > > Total states run: 17 > > Total run time: 38.874 s > > > > > > > > Before running Stage 1, the /srv/pillar/ceph/proposals directory does > > not > > exist. > > salt:~ # ls /srv/pillar/ceph/proposals/ > > > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or > > > > directory > > > > > That?s where I?m at ? Googling.. > > > > ~ Kevin > > > > From: on behalf of Joel Zhou > > > > Reply-To: Discussions about the DeepSea management > > > framework for Ceph Date: Tuesday, > > September > > 18, 2018 at 11:34 PM > > To: Discussions about the DeepSea management framework for Ceph > > > > Subject: Re: [Deepsea-users] stage 1 errors > > > on Azure > > > > Hi Kevin, > > > > My short answer is, > > > > Step 1, before stage 0, check your salt-api service on salt-master > > node > > first. > > ```bash > > > zypper install -y salt-api > > systemctl enable salt-api.service > > systemctl start salt-api.service > > ``` > > Step 2, make sure NTP service works correctly on all nodes, which > > means time synchronized correctly on all nodes. > > Step 3, reboot all your nodes, if > > > acceptable. In case of kernel updated somehow. Step 4, then you have > > to > > start over again from stage 0 to 5. > > > > Basically, deepsea is a bunch of salt scripts, and salt based on > > python2 > > and/or python3. > > I have no clues about your whole running stack, so assume > > > SLES 12 sp3 + SES 5, which works fine and supported. More info would > > be > > helpful, and also your purpose, such as for practice on your own, or > > for > > PoC/testing to meet customer?s demands. > > Regards, > > > > -- > > Joel Zhou ??? > > Senior Storage Technologist, APJ > > > > Mobile: +86 18514577601 > > Email: joel.zhou at suse.com > > > > From: on behalf of Kevin Ayres > > > > Reply-To: Discussions about the DeepSea management > > > framework for Ceph Date: Tuesday, > > September > > 18, 2018 at 4:49 PM > > To: "deepsea-users at lists.suse.com" > > Subject: [Deepsea-users] stage 1 errors on Azure > > > > Hey guys, I can?t seem to get past stage 1. Stage 0 complete > > successfully. > > Same output with deepsea command. The master and minion service are > > running and bidirectional host resolution are good. Keys are all > > accepted. From what I can determine, the default files are not > > created by stage 0 for some reason. Thoughts? What I?m seeing is that > > it fails to create the > > /srv/pillar/ceph/proposals > > > > > I?m running through this doc line by line: > > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtm > > l/boo k_storage_deployment/book_storage_deployment.html#deepsea.cli > > > > > ~ Kevin > > > > > > salt:~ # salt-run state.orch ceph.stage.discovery > > > > salt-api : ["Salt API is failing to authenticate - > > try > > 'systemctl restart salt-master': list index out of range"] > > > > > deepsea_minions : valid > > > > master_minion : valid > > > > ceph_version : valid > > > > [ERROR ] No highstate or sls specified, no execution made > > > > salt_master: > > > > ---------- > > > > > > ID: salt-api failed > > > > > > > > Function: salt.state > > > > > > > > Name: just.exit > > > > > > > > Result: False > > > > > > > > Comment: No highstate or sls specified, no execution made > > > > > > > > Started: 22:30:53.628882 > > > > > > > > Duration: 0.647 ms > > > > > > > > Changes: > > > > > > > > > > Summary for salt_master > > > > ------------ > > > > Succeeded: 0 > > > > Failed: 1 > > > > ------------ > > > > Total states run: 1 > > > > Total run time: 0.647 ms > > > > salt:~ # !tail > > tail -f /var/log/salt/master > > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING > > ][8499] > > role-igw/cluster/igw*.sls matched no files > > 2018-09-18 22:29:08,797 > > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > role-openattic/cluster/salt.sls matched no files 2018-09-18 > > 22:29:08,797 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > config/stack/default/global.yml matched no files 2018-09-18 > > 22:29:08,798 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > config/stack/default/ceph/cluster.yml matched no files 2018-09-18 > > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] > > cluster/*.sls matched no files 2018-09-18 22:29:08,798 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > stack/default/ceph/minions/*.yml matched no files 2018-09-18 > > 22:29:08,822 > > [salt.state ][ERROR ][8499] No highstate or sls specified, no > > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR > > ][5672] Exception occurred while handling stream: [Errno 0] Success > > 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No > > highstate or sls specified, no execution made 2018-09-18 22:30:53,629 > > [salt.state ][ERROR ][9272] No highstate or sls specified, no > > execution made > > There?s also some issue with the salt-minion.service: > > ? salt-minion.service - The Salt Minion > > > > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; > > enabled; > > > > vendor preset: disabled) > > Active: active (running) since Tue 2018-09-18 > > > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion) > > ? > > ..... > > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion. > > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function > > cephimages.list in mine_functions failed to execute > > > > > > > > > > _______________________________________________ > Deepsea-users mailing list > Deepsea-users at lists.suse.com > http://lists.suse.com/mailman/listinfo/deepsea-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.zhou at suse.com Wed Sep 19 16:46:59 2018 From: joel.zhou at suse.com (Joel Zhou) Date: Wed, 19 Sep 2018 22:46:59 +0000 Subject: [Deepsea-users] stage 1 errors on Azure In-Reply-To: References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> <1624041.v2sRp242nD@fury.home> <33994725-E346-4599-842A-042E5DBFA138@suse.com> <7625052.XIqpMaYcJK@fury.home> Message-ID: Kevin, In another word, your DNS is configured and behaving NOT correctly. Personally, I?d like to append every single record to the host file, including salt-master, even without localhost record. Anyway, you are good to go. Regards, -- Joel Zhou ??? Senior Storage Technologist, APJ Mobile: +86 18514577601 Email: joel.zhou at suse.com From: on behalf of Kevin Ayres Reply-To: Discussions about the DeepSea management framework for Ceph Date: Wednesday, September 19, 2018 at 4:03 PM To: Discussions about the DeepSea management framework for Ceph Subject: Re: [Deepsea-users] stage 1 errors on Azure Thanks Eric! I understand. Couldn?t find localhost ? DOH moment. Salt-master restarted fine each time without errors but api was failing. salt:~ # salt-run validate.saltapi salt-api : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"] False Localhost was missing due to the heavy /etc/hosts modifications I made for Azure instance resolution. I just appended localhost into ?127.0.0.1 salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt? salt:~ # salt-run validate.saltapi salt-api : valid salt:~ # curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=fbc39dd2-2bba-42ec-ab9b-7d9e71b84047 -d eauth=sharedsecret HTTP/1.1 200 OK Content-Length: 204 Access-Control-Expose-Headers: GET, POST Vary: Accept-Encoding Server: CherryPy/3.6.0 Allow: GET, HEAD, POST Access-Control-Allow-Credentials: true Date: Wed, 19 Sep 2018 21:39:28 GMT Access-Control-Allow-Origin: * X-Auth-Token: 640bb306bd8fb202ef71757aac83f0db9beb4e11 Content-Type: application/json Set-Cookie: session_id=640bb306bd8fb202ef71757aac83f0db9beb4e11; expires=Thu, 20 Sep 2018 07:39:28 GMT; Path=/ {"return": [{"perms": [".*", "@runner", "@wheel"], "start": 1537393168.443645, "token": "640bb306bd8fb202ef71757aac83f0db9beb4e11", "expire": 1537436368.443646, "user": "admin", "eauth": "sharedsecret"}]}salt:~ # Now Stage 1 runs through. salt:~ # salt-run state.orch ceph.stage.discovery salt-api : valid deepsea_minions : valid master_minion : valid ceph_version : valid [WARNING ] All minions are ready {} salt_master: Name: minions.ready - Function: salt.runner - Result: Changed Started: - 21:59:25.186357 Duration: 1528.309 ms Name: refresh_pillar0 - Function: salt.state - Result: Changed Started: - 21:59:26.714801 Duration: 340.017 ms Name: populate.proposals - Function: salt.runner - Result: Changed Started: - 21:59:27.055255 Duration: 5107.852 ms Name: proposal.populate - Function: salt.runner - Result: Changed Started: - 21:59:32.163281 Duration: 2578.835 ms Summary for salt_master ------------ Succeeded: 4 (changed=4) Failed: 0 ------------ Total states run: 4 Total run time: 9.555 s And Proposals exist. salt:~ # ls /srv/pillar/ceph/proposals/ cluster-ceph config role-admin role-client-cephfs role-client-nfs role-ganesha role-master role-mgr role-openattic cluster-unassigned profile-default role-benchmark-rbd role-client-iscsi role-client-radosgw role-igw role-mds role-mon role-rgw Thanks again Eric! ~ Kevin From: on behalf of Eric Jackson Reply-To: Discussions about the DeepSea management framework for Ceph Date: Wednesday, September 19, 2018 at 1:39 PM To: "deepsea-users at lists.suse.com" Subject: Re: [Deepsea-users] stage 1 errors on Azure The rpm would be installed after Salt is configured. I understand some installations install both Salt and DeepSea via YaST. We did try to make accommodations for that scenario. So, your salt-api is still down. We ran into drastically different reasons why the salt-api can fail. We did not have the bandwidth to address each type of failure. The check we do is a curl command to verify that the salt-api is answering. curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=xxx -d eauth=sharedsecret The sharedsecret is in /etc/salt/master/sharedsecret.conf. It's possible to have the Salt master remember the previous contents of that file and the Salt api to use the current contents if an admin does things just right :) . That's why we give the error message about restarting the Salt master. However, if the above curl command fails because the Salt-api is down or maybe localhost is not defined in /etc/hosts (ran into that once). The curl command may shed more light on the failure. *** As far as how would you have found that curl command. Take a look at the contents of /srv/modules/runners/validate.py. About line 626, you will see the python code is literally calling the curl command. In other words, do not be intimidated about the python code. Much of it is calling some of the same command line tools that you would use. We are just using Salt to do much of this in parallel. It's also possible to run this directly without invoking Stage 1. # salt-run validate.saltapi Although the validations can be frustrating, not having them is worse. The situations where we did not check for Salt api lead to incredibly painful debug sessions. Eric On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote: > Thanks Eric, Yes, I understand this but worded it poorly. I don't see any > issues with NTP or DNS. Something else is amiss. Should deepsea be > installed after salt as outlined in the deployment doc, or before? > salt:~ # salt-run state.orch ceph.stage.discovery > salt-api : ["Salt API is failing to authenticate - try > 'systemctl restart salt-master': list index out of range"] deepsea_minions > : valid > master_minion : valid > ceph_version : valid > [ERROR ] No highstate or sls specified, no execution made > salt_master: > ---------- > ID: salt-api failed > Function: salt.state > Name: just.exit > Result: False > Comment: No highstate or sls specified, no execution made > Started: 19:38:41.962044 > Duration: 0.734 ms > Changes: > > Summary for salt_master > ------------ > Succeeded: 0 > Failed: 1 > ------------ > Total states run: 1 > Total run time: 0.734 ms > > > salt:~ # tail -f /var/log/salt/master > 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319] > All minions are ready 2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR > ][1626] Exception occurred while handling stream: [Errno 0] Success > 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate > or sls specified, no execution made > salt:~ # ls /srv/pillar/ceph/proposals > ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory > > salt:~ # ls /srv/pillar/ceph/ > benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave > init.sls master_minion.sls master_minion.sls.rpmsave stack > > ~ Kevin > > On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of > Eric Jackson" ejackson at suse.com> wrote: > Hi Kevin, > Stage 0 only does the "preparation" part. That is, sync'ing salt > modules, zypper updates, etc. Stage 1 is the "discovery" part that > interrogates the minions and then creates the roles and storage fragments. > If your salt-api issue is resolved, Stage 1 should run relatively quick. > > Eric > > On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote: > > > Thanks Joel, yes DNS, NTP is configured and behaving correctly. > > SP3/SES5 > > from current repo. salt-api service, master, minion service running > > (with > > one error.) > > I?m walking through the Deployment guide line by line with > > > same result, now on my second freshly built master node. Salt output > > is at > > the bottom of this message. Key: After stage 0, the */proposals > > directory > > has NOT been created. > > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24): > > Root ssh enabled and key based login from master to all nodes as root. > > All > > nodes rebooted before salt stage. > > All nodes using identical image and > > > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - > > the > > Azure instance defaults. > > Salt (and all nodes):~ # zypper lr -E > > Repository priorities are without effect. All enabled repositories > > share the same priority. > > # | Alias > > > | Name | Enabled | GPG Check > > | | > > > > Refresh > > ---+------------------------------------------------------------------ > > --+-- > > ---------------------------------+---------+-----------+-------- 3 | > > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool | > > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 | > > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates | > > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 | > > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool | > > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 | > > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates | > > SLES12-SP3-Updates | Yes | (r ) Yes | Yes > > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but > > I?ve also updated hosts files. > > salt:~ # hostname > > > salt > > salt:~ # ping salt > > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > (127.0.0.1) > > 56(84) bytes of data. > > 64 bytes from > > > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1): > > icmp_seq=1 ttl=64 time=0.030 ms > > 104.211.27.224 Outside NAT to 172.19.20.10 > > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt > > 172.19.20.12 > > > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1 > > > > 172.19.20.13 > > mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > mon2 172.19.20.14 > > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3 > > 172.19.20.15 > > > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1 > > > > 172.19.20.16 > > osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > osd2 172.19.20.17 > > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3 > > 172.19.20.18 > > > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1 > > > > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in > > same > > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown: > > bash-3.2$ pssh -h > > > pssh-hosts -l sesuser -i sudo ntpq -p > > [1] 11:15:27 [SUCCESS] mon3 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 19 64 1 > > 15.596 -4.863 0.333 [2] 11:15:27 [SUCCESS] salt > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 42 64 1 > > 17.063 -6.702 0.000 [3] 11:15:27 [SUCCESS] igw1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 18 64 1 > > 17.394 -27.874 7.663 [4] 11:15:27 [SUCCESS] osd1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 21 64 1 > > 16.962 -3.755 0.813 [5] 11:15:27 [SUCCESS] osd2 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 22 64 1 > > 15.832 -4.709 3.062 [6] 11:15:27 [SUCCESS] osd3 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 26 64 1 > > 15.877 -3.252 19.131 [7] 11:15:27 [SUCCESS] mon1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 2 64 1 > > 16.120 -4.263 0.000 [8] 11:15:27 [SUCCESS] mon2 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 2 64 1 > > 16.108 -7.713 0.959 > > > > **SALT** > > salt:~ # systemctl status salt-api salt-master salt-minion |grep > > 'active > > (running)' > > Active: active (running) since Wed 2018-09-19 18:14:39 UTC; > > > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC; > > 13min ago Active: active (running) since Wed 2018-09-19 18:14:41 > > UTC; 13min ago salt:~ # systemctl status salt-api salt-master > > salt-minion |grep ERROR > > > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function > > > > cephimages.list in mine_functions failed to execute > > > > > salt:~ # salt-key --list-all > > > > Accepted Keys: > > igw1 > > mon1 > > mon2 > > mon3 > > osd1 > > osd2 > > osd3 > > salt > > Denied Keys: > > Unaccepted Keys: > > Rejected Keys: > > > > > > salt:~ # salt '*' test.ping > > salt: > > > > True > > > > osd2: > > > > True > > > > mon3: > > > > True > > > > osd3: > > > > True > > > > osd1: > > > > True > > > > mon2: > > > > True > > > > igw1: > > > > True > > > > mon1: > > > > True > > > > > > salt:~ # cat /srv/pillar/ceph/master_minion.sls > > master_minion: salt > > > > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls > > ... > > # Choose all minions > > deepsea_minions: '*' > > ... > > > > **SALT STAGES** > > Stage 0 is successful with no errors but does not create the > > proposals > > folder. > > > > > salt:~ # salt-run state.orch ceph.stage.prep > > > > deepsea_minions : valid > > master_minion : valid > > ceph_version : valid > > [WARNING ] All minions are ready > > salt_master: > > > > Name: sync master - Function: salt.state - Result: Changed > > Started: - > > > > 18:44:20.440255 Duration: 949.98 ms > > Name: salt-api - Function: salt.state > > > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms > > Name: > > repo master - Function: salt.state - Result: Clean Started: - > > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master - > > Function: > > salt.state - Result: Clean Started: - 18:44:24.998333 Duration: > > 1127.063 ms Name: prepare master - Function: salt.state - Result: > > Changed Started: - 18:44:26.125514 Duration: 4109.917 ms Name: > > filequeue.remove - Function: salt.runner - Result: Changed Started: - > > 18:44:30.235610 Duration: 2071.199 ms Name: restart master - > > Function: salt.state - Result: Clean Started: - 18:44:32.306972 > > Duration: 1006.268 ms Name: filequeue.add - Function: salt.runner - > > Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms Name: > > minions.ready - Function: salt.runner - Result: Changed Started: - > > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function: > > salt.state - Result: Clean Started: - 18:44:36.558363 Duration: > > 553.342 ms Name: metapackage minions - Function: salt.state - Result: > > Clean Started: - 18:44:37.111825 Duration: 3993.733 ms Name: common > > packages - Function: salt.state - Result: Clean Started: - > > 18:44:41.105706 Duration: 2434.079 ms Name: sync - Function: > > salt.state - Result: Changed Started: - > > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function: > > salt.state - > > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name: > > updates - Function: salt.state - Result: Changed Started: - > > 18:44:46.578853 Duration: 11183.347 ms Name: restart - Function: > > salt.state - Result: Clean Started: - 18:44:57.762346 Duration: > > 1553.957 ms Name: mds restart noop - Function: test.nop - Result: > > Clean Started: - 18:44:59.316442 Duration: 0.348 ms > > > > Summary for salt_master > > ------------- > > Succeeded: 17 (changed=8) > > Failed: 0 > > ------------- > > Total states run: 17 > > Total run time: 38.874 s > > > > > > > > Before running Stage 1, the /srv/pillar/ceph/proposals directory does > > not > > exist. > > salt:~ # ls /srv/pillar/ceph/proposals/ > > > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or > > > > directory > > > > > That?s where I?m at ? Googling.. > > > > ~ Kevin > > > > From: on behalf of Joel Zhou > > > > Reply-To: Discussions about the DeepSea management > > > framework for Ceph Date: Tuesday, > > September > > 18, 2018 at 11:34 PM > > To: Discussions about the DeepSea management framework for Ceph > > > > Subject: Re: [Deepsea-users] stage 1 errors > > > on Azure > > > > Hi Kevin, > > > > My short answer is, > > > > Step 1, before stage 0, check your salt-api service on salt-master > > node > > first. > > ```bash > > > zypper install -y salt-api > > systemctl enable salt-api.service > > systemctl start salt-api.service > > ``` > > Step 2, make sure NTP service works correctly on all nodes, which > > means time synchronized correctly on all nodes. > > Step 3, reboot all your nodes, if > > > acceptable. In case of kernel updated somehow. Step 4, then you have > > to > > start over again from stage 0 to 5. > > > > Basically, deepsea is a bunch of salt scripts, and salt based on > > python2 > > and/or python3. > > I have no clues about your whole running stack, so assume > > > SLES 12 sp3 + SES 5, which works fine and supported. More info would > > be > > helpful, and also your purpose, such as for practice on your own, or > > for > > PoC/testing to meet customer?s demands. > > Regards, > > > > -- > > Joel Zhou ??? > > Senior Storage Technologist, APJ > > > > Mobile: +86 18514577601 > > Email: joel.zhou at suse.com > > > > From: on behalf of Kevin Ayres > > > > Reply-To: Discussions about the DeepSea management > > > framework for Ceph Date: Tuesday, > > September > > 18, 2018 at 4:49 PM > > To: "deepsea-users at lists.suse.com" > > Subject: [Deepsea-users] stage 1 errors on Azure > > > > Hey guys, I can?t seem to get past stage 1. Stage 0 complete > > successfully. > > Same output with deepsea command. The master and minion service are > > running and bidirectional host resolution are good. Keys are all > > accepted. From what I can determine, the default files are not > > created by stage 0 for some reason. Thoughts? What I?m seeing is that > > it fails to create the > > /srv/pillar/ceph/proposals > > > > > I?m running through this doc line by line: > > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtm > > l/boo k_storage_deployment/book_storage_deployment.html#deepsea.cli > > > > > ~ Kevin > > > > > > salt:~ # salt-run state.orch ceph.stage.discovery > > > > salt-api : ["Salt API is failing to authenticate - > > try > > 'systemctl restart salt-master': list index out of range"] > > > > > deepsea_minions : valid > > > > master_minion : valid > > > > ceph_version : valid > > > > [ERROR ] No highstate or sls specified, no execution made > > > > salt_master: > > > > ---------- > > > > > > ID: salt-api failed > > > > > > > > Function: salt.state > > > > > > > > Name: just.exit > > > > > > > > Result: False > > > > > > > > Comment: No highstate or sls specified, no execution made > > > > > > > > Started: 22:30:53.628882 > > > > > > > > Duration: 0.647 ms > > > > > > > > Changes: > > > > > > > > > > Summary for salt_master > > > > ------------ > > > > Succeeded: 0 > > > > Failed: 1 > > > > ------------ > > > > Total states run: 1 > > > > Total run time: 0.647 ms > > > > salt:~ # !tail > > tail -f /var/log/salt/master > > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING > > ][8499] > > role-igw/cluster/igw*.sls matched no files > > 2018-09-18 22:29:08,797 > > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > role-openattic/cluster/salt.sls matched no files 2018-09-18 > > 22:29:08,797 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > config/stack/default/global.yml matched no files 2018-09-18 > > 22:29:08,798 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > config/stack/default/ceph/cluster.yml matched no files 2018-09-18 > > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] > > cluster/*.sls matched no files 2018-09-18 22:29:08,798 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > stack/default/ceph/minions/*.yml matched no files 2018-09-18 > > 22:29:08,822 > > [salt.state ][ERROR ][8499] No highstate or sls specified, no > > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR > > ][5672] Exception occurred while handling stream: [Errno 0] Success > > 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No > > highstate or sls specified, no execution made 2018-09-18 22:30:53,629 > > [salt.state ][ERROR ][9272] No highstate or sls specified, no > > execution made > > There?s also some issue with the salt-minion.service: > > ? salt-minion.service - The Salt Minion > > > > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; > > enabled; > > > > vendor preset: disabled) > > Active: active (running) since Tue 2018-09-18 > > > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion) > > ? > > ..... > > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion. > > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function > > cephimages.list in mine_functions failed to execute > > > > > > > > > > _______________________________________________ > Deepsea-users mailing list > Deepsea-users at lists.suse.com > http://lists.suse.com/mailman/listinfo/deepsea-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.ayres at suse.com Wed Sep 19 17:04:11 2018 From: kevin.ayres at suse.com (Kevin Ayres) Date: Wed, 19 Sep 2018 23:04:11 +0000 Subject: [Deepsea-users] stage 1 errors on Azure In-Reply-To: References: <12D2E30B-9CAC-4A7E-987B-28AD8A4E5D32@suse.com> <1624041.v2sRp242nD@fury.home> <33994725-E346-4599-842A-042E5DBFA138@suse.com> <7625052.XIqpMaYcJK@fury.home> Message-ID: Yup indeed! Thanks Joel. I have new issues at stage 2 but will start a new thread later. UG. Azure/AWS oddities may need to be a whole topic at some point. Not that?s I?d do this on AWS except for deployment or automation testing.. ~ Kevin From: on behalf of Joel Zhou Reply-To: Discussions about the DeepSea management framework for Ceph Date: Wednesday, September 19, 2018 at 3:47 PM To: Discussions about the DeepSea management framework for Ceph Subject: Re: [Deepsea-users] stage 1 errors on Azure Kevin, In another word, your DNS is configured and behaving NOT correctly. Personally, I?d like to append every single record to the host file, including salt-master, even without localhost record. Anyway, you are good to go. Regards, -- Joel Zhou ??? Senior Storage Technologist, APJ Mobile: +86 18514577601 Email: joel.zhou at suse.com From: on behalf of Kevin Ayres Reply-To: Discussions about the DeepSea management framework for Ceph Date: Wednesday, September 19, 2018 at 4:03 PM To: Discussions about the DeepSea management framework for Ceph Subject: Re: [Deepsea-users] stage 1 errors on Azure Thanks Eric! I understand. Couldn?t find localhost ? DOH moment. Salt-master restarted fine each time without errors but api was failing. salt:~ # salt-run validate.saltapi salt-api : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"] False Localhost was missing due to the heavy /etc/hosts modifications I made for Azure instance resolution. I just appended localhost into ?127.0.0.1 salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt? salt:~ # salt-run validate.saltapi salt-api : valid salt:~ # curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=fbc39dd2-2bba-42ec-ab9b-7d9e71b84047 -d eauth=sharedsecret HTTP/1.1 200 OK Content-Length: 204 Access-Control-Expose-Headers: GET, POST Vary: Accept-Encoding Server: CherryPy/3.6.0 Allow: GET, HEAD, POST Access-Control-Allow-Credentials: true Date: Wed, 19 Sep 2018 21:39:28 GMT Access-Control-Allow-Origin: * X-Auth-Token: 640bb306bd8fb202ef71757aac83f0db9beb4e11 Content-Type: application/json Set-Cookie: session_id=640bb306bd8fb202ef71757aac83f0db9beb4e11; expires=Thu, 20 Sep 2018 07:39:28 GMT; Path=/ {"return": [{"perms": [".*", "@runner", "@wheel"], "start": 1537393168.443645, "token": "640bb306bd8fb202ef71757aac83f0db9beb4e11", "expire": 1537436368.443646, "user": "admin", "eauth": "sharedsecret"}]}salt:~ # Now Stage 1 runs through. salt:~ # salt-run state.orch ceph.stage.discovery salt-api : valid deepsea_minions : valid master_minion : valid ceph_version : valid [WARNING ] All minions are ready {} salt_master: Name: minions.ready - Function: salt.runner - Result: Changed Started: - 21:59:25.186357 Duration: 1528.309 ms Name: refresh_pillar0 - Function: salt.state - Result: Changed Started: - 21:59:26.714801 Duration: 340.017 ms Name: populate.proposals - Function: salt.runner - Result: Changed Started: - 21:59:27.055255 Duration: 5107.852 ms Name: proposal.populate - Function: salt.runner - Result: Changed Started: - 21:59:32.163281 Duration: 2578.835 ms Summary for salt_master ------------ Succeeded: 4 (changed=4) Failed: 0 ------------ Total states run: 4 Total run time: 9.555 s And Proposals exist. salt:~ # ls /srv/pillar/ceph/proposals/ cluster-ceph config role-admin role-client-cephfs role-client-nfs role-ganesha role-master role-mgr role-openattic cluster-unassigned profile-default role-benchmark-rbd role-client-iscsi role-client-radosgw role-igw role-mds role-mon role-rgw Thanks again Eric! ~ Kevin From: on behalf of Eric Jackson Reply-To: Discussions about the DeepSea management framework for Ceph Date: Wednesday, September 19, 2018 at 1:39 PM To: "deepsea-users at lists.suse.com" Subject: Re: [Deepsea-users] stage 1 errors on Azure The rpm would be installed after Salt is configured. I understand some installations install both Salt and DeepSea via YaST. We did try to make accommodations for that scenario. So, your salt-api is still down. We ran into drastically different reasons why the salt-api can fail. We did not have the bandwidth to address each type of failure. The check we do is a curl command to verify that the salt-api is answering. curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=xxx -d eauth=sharedsecret The sharedsecret is in /etc/salt/master/sharedsecret.conf. It's possible to have the Salt master remember the previous contents of that file and the Salt api to use the current contents if an admin does things just right :) . That's why we give the error message about restarting the Salt master. However, if the above curl command fails because the Salt-api is down or maybe localhost is not defined in /etc/hosts (ran into that once). The curl command may shed more light on the failure. *** As far as how would you have found that curl command. Take a look at the contents of /srv/modules/runners/validate.py. About line 626, you will see the python code is literally calling the curl command. In other words, do not be intimidated about the python code. Much of it is calling some of the same command line tools that you would use. We are just using Salt to do much of this in parallel. It's also possible to run this directly without invoking Stage 1. # salt-run validate.saltapi Although the validations can be frustrating, not having them is worse. The situations where we did not check for Salt api lead to incredibly painful debug sessions. Eric On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote: > Thanks Eric, Yes, I understand this but worded it poorly. I don't see any > issues with NTP or DNS. Something else is amiss. Should deepsea be > installed after salt as outlined in the deployment doc, or before? > salt:~ # salt-run state.orch ceph.stage.discovery > salt-api : ["Salt API is failing to authenticate - try > 'systemctl restart salt-master': list index out of range"] deepsea_minions > : valid > master_minion : valid > ceph_version : valid > [ERROR ] No highstate or sls specified, no execution made > salt_master: > ---------- > ID: salt-api failed > Function: salt.state > Name: just.exit > Result: False > Comment: No highstate or sls specified, no execution made > Started: 19:38:41.962044 > Duration: 0.734 ms > Changes: > > Summary for salt_master > ------------ > Succeeded: 0 > Failed: 1 > ------------ > Total states run: 1 > Total run time: 0.734 ms > > > salt:~ # tail -f /var/log/salt/master > 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319] > All minions are ready 2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR > ][1626] Exception occurred while handling stream: [Errno 0] Success > 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate > or sls specified, no execution made > salt:~ # ls /srv/pillar/ceph/proposals > ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory > > salt:~ # ls /srv/pillar/ceph/ > benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave > init.sls master_minion.sls master_minion.sls.rpmsave stack > > ~ Kevin > > On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of > Eric Jackson" ejackson at suse.com> wrote: > Hi Kevin, > Stage 0 only does the "preparation" part. That is, sync'ing salt > modules, zypper updates, etc. Stage 1 is the "discovery" part that > interrogates the minions and then creates the roles and storage fragments. > If your salt-api issue is resolved, Stage 1 should run relatively quick. > > Eric > > On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote: > > > Thanks Joel, yes DNS, NTP is configured and behaving correctly. > > SP3/SES5 > > from current repo. salt-api service, master, minion service running > > (with > > one error.) > > I?m walking through the Deployment guide line by line with > > > same result, now on my second freshly built master node. Salt output > > is at > > the bottom of this message. Key: After stage 0, the */proposals > > directory > > has NOT been created. > > Here?s my build on a single flat network(Azure vNet 172.19.20.0/24): > > Root ssh enabled and key based login from master to all nodes as root. > > All > > nodes rebooted before salt stage. > > All nodes using identical image and > > > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. - > > the > > Azure instance defaults. > > Salt (and all nodes):~ # zypper lr -E > > Repository priorities are without effect. All enabled repositories > > share the same priority. > > # | Alias > > > | Name | Enabled | GPG Check > > | | > > > > Refresh > > ---+------------------------------------------------------------------ > > --+-- > > ---------------------------------+---------+-----------+-------- 3 | > > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool | > > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 | > > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates | > > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 | > > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool | > > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 | > > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates | > > SLES12-SP3-Updates | Yes | (r ) Yes | Yes > > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but > > I?ve also updated hosts files. > > salt:~ # hostname > > > salt > > salt:~ # ping salt > > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > (127.0.0.1) > > 56(84) bytes of data. > > 64 bytes from > > > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1): > > icmp_seq=1 ttl=64 time=0.030 ms > > 104.211.27.224 Outside NAT to 172.19.20.10 > > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt > > 172.19.20.12 > > > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1 > > > > 172.19.20.13 > > mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > mon2 172.19.20.14 > > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3 > > 172.19.20.15 > > > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1 > > > > 172.19.20.16 > > osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net > > osd2 172.19.20.17 > > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3 > > 172.19.20.18 > > > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1 > > > > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in > > same > > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown: > > bash-3.2$ pssh -h > > > pssh-hosts -l sesuser -i sudo ntpq -p > > [1] 11:15:27 [SUCCESS] mon3 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 19 64 1 > > 15.596 -4.863 0.333 [2] 11:15:27 [SUCCESS] salt > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 42 64 1 > > 17.063 -6.702 0.000 [3] 11:15:27 [SUCCESS] igw1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 18 64 1 > > 17.394 -27.874 7.663 [4] 11:15:27 [SUCCESS] osd1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 21 64 1 > > 16.962 -3.755 0.813 [5] 11:15:27 [SUCCESS] osd2 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 22 64 1 > > 15.832 -4.709 3.062 [6] 11:15:27 [SUCCESS] osd3 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === *navobs1.gatech. .GPS. 1 u 26 64 1 > > 15.877 -3.252 19.131 [7] 11:15:27 [SUCCESS] mon1 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 2 64 1 > > 16.120 -4.263 0.000 [8] 11:15:27 [SUCCESS] mon2 > > > > remote refid st t when poll reach delay offset > > > > > > jitter > > ====================================================================== > > ===== === navobs1.gatech. .GPS. 1 u 2 64 1 > > 16.108 -7.713 0.959 > > > > **SALT** > > salt:~ # systemctl status salt-api salt-master salt-minion |grep > > 'active > > (running)' > > Active: active (running) since Wed 2018-09-19 18:14:39 UTC; > > > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC; > > 13min ago Active: active (running) since Wed 2018-09-19 18:14:41 > > UTC; 13min ago salt:~ # systemctl status salt-api salt-master > > salt-minion |grep ERROR > > > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function > > > > cephimages.list in mine_functions failed to execute > > > > > salt:~ # salt-key --list-all > > > > Accepted Keys: > > igw1 > > mon1 > > mon2 > > mon3 > > osd1 > > osd2 > > osd3 > > salt > > Denied Keys: > > Unaccepted Keys: > > Rejected Keys: > > > > > > salt:~ # salt '*' test.ping > > salt: > > > > True > > > > osd2: > > > > True > > > > mon3: > > > > True > > > > osd3: > > > > True > > > > osd1: > > > > True > > > > mon2: > > > > True > > > > igw1: > > > > True > > > > mon1: > > > > True > > > > > > salt:~ # cat /srv/pillar/ceph/master_minion.sls > > master_minion: salt > > > > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls > > ... > > # Choose all minions > > deepsea_minions: '*' > > ... > > > > **SALT STAGES** > > Stage 0 is successful with no errors but does not create the > > proposals > > folder. > > > > > salt:~ # salt-run state.orch ceph.stage.prep > > > > deepsea_minions : valid > > master_minion : valid > > ceph_version : valid > > [WARNING ] All minions are ready > > salt_master: > > > > Name: sync master - Function: salt.state - Result: Changed > > Started: - > > > > 18:44:20.440255 Duration: 949.98 ms > > Name: salt-api - Function: salt.state > > > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms > > Name: > > repo master - Function: salt.state - Result: Clean Started: - > > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master - > > Function: > > salt.state - Result: Clean Started: - 18:44:24.998333 Duration: > > 1127.063 ms Name: prepare master - Function: salt.state - Result: > > Changed Started: - 18:44:26.125514 Duration: 4109.917 ms Name: > > filequeue.remove - Function: salt.runner - Result: Changed Started: - > > 18:44:30.235610 Duration: 2071.199 ms Name: restart master - > > Function: salt.state - Result: Clean Started: - 18:44:32.306972 > > Duration: 1006.268 ms Name: filequeue.add - Function: salt.runner - > > Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms Name: > > minions.ready - Function: salt.runner - Result: Changed Started: - > > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function: > > salt.state - Result: Clean Started: - 18:44:36.558363 Duration: > > 553.342 ms Name: metapackage minions - Function: salt.state - Result: > > Clean Started: - 18:44:37.111825 Duration: 3993.733 ms Name: common > > packages - Function: salt.state - Result: Clean Started: - > > 18:44:41.105706 Duration: 2434.079 ms Name: sync - Function: > > salt.state - Result: Changed Started: - > > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function: > > salt.state - > > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name: > > updates - Function: salt.state - Result: Changed Started: - > > 18:44:46.578853 Duration: 11183.347 ms Name: restart - Function: > > salt.state - Result: Clean Started: - 18:44:57.762346 Duration: > > 1553.957 ms Name: mds restart noop - Function: test.nop - Result: > > Clean Started: - 18:44:59.316442 Duration: 0.348 ms > > > > Summary for salt_master > > ------------- > > Succeeded: 17 (changed=8) > > Failed: 0 > > ------------- > > Total states run: 17 > > Total run time: 38.874 s > > > > > > > > Before running Stage 1, the /srv/pillar/ceph/proposals directory does > > not > > exist. > > salt:~ # ls /srv/pillar/ceph/proposals/ > > > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or > > > > directory > > > > > That?s where I?m at ? Googling.. > > > > ~ Kevin > > > > From: on behalf of Joel Zhou > > > > Reply-To: Discussions about the DeepSea management > > > framework for Ceph Date: Tuesday, > > September > > 18, 2018 at 11:34 PM > > To: Discussions about the DeepSea management framework for Ceph > > > > Subject: Re: [Deepsea-users] stage 1 errors > > > on Azure > > > > Hi Kevin, > > > > My short answer is, > > > > Step 1, before stage 0, check your salt-api service on salt-master > > node > > first. > > ```bash > > > zypper install -y salt-api > > systemctl enable salt-api.service > > systemctl start salt-api.service > > ``` > > Step 2, make sure NTP service works correctly on all nodes, which > > means time synchronized correctly on all nodes. > > Step 3, reboot all your nodes, if > > > acceptable. In case of kernel updated somehow. Step 4, then you have > > to > > start over again from stage 0 to 5. > > > > Basically, deepsea is a bunch of salt scripts, and salt based on > > python2 > > and/or python3. > > I have no clues about your whole running stack, so assume > > > SLES 12 sp3 + SES 5, which works fine and supported. More info would > > be > > helpful, and also your purpose, such as for practice on your own, or > > for > > PoC/testing to meet customer?s demands. > > Regards, > > > > -- > > Joel Zhou ??? > > Senior Storage Technologist, APJ > > > > Mobile: +86 18514577601 > > Email: joel.zhou at suse.com > > > > From: on behalf of Kevin Ayres > > > > Reply-To: Discussions about the DeepSea management > > > framework for Ceph Date: Tuesday, > > September > > 18, 2018 at 4:49 PM > > To: "deepsea-users at lists.suse.com" > > Subject: [Deepsea-users] stage 1 errors on Azure > > > > Hey guys, I can?t seem to get past stage 1. Stage 0 complete > > successfully. > > Same output with deepsea command. The master and minion service are > > running and bidirectional host resolution are good. Keys are all > > accepted. From what I can determine, the default files are not > > created by stage 0 for some reason. Thoughts? What I?m seeing is that > > it fails to create the > > /srv/pillar/ceph/proposals > > > > > I?m running through this doc line by line: > > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtm > > l/boo k_storage_deployment/book_storage_deployment.html#deepsea.cli > > > > > ~ Kevin > > > > > > salt:~ # salt-run state.orch ceph.stage.discovery > > > > salt-api : ["Salt API is failing to authenticate - > > try > > 'systemctl restart salt-master': list index out of range"] > > > > > deepsea_minions : valid > > > > master_minion : valid > > > > ceph_version : valid > > > > [ERROR ] No highstate or sls specified, no execution made > > > > salt_master: > > > > ---------- > > > > > > ID: salt-api failed > > > > > > > > Function: salt.state > > > > > > > > Name: just.exit > > > > > > > > Result: False > > > > > > > > Comment: No highstate or sls specified, no execution made > > > > > > > > Started: 22:30:53.628882 > > > > > > > > Duration: 0.647 ms > > > > > > > > Changes: > > > > > > > > > > Summary for salt_master > > > > ------------ > > > > Succeeded: 0 > > > > Failed: 1 > > > > ------------ > > > > Total states run: 1 > > > > Total run time: 0.647 ms > > > > salt:~ # !tail > > tail -f /var/log/salt/master > > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING > > ][8499] > > role-igw/cluster/igw*.sls matched no files > > 2018-09-18 22:29:08,797 > > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > role-openattic/cluster/salt.sls matched no files 2018-09-18 > > 22:29:08,797 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > config/stack/default/global.yml matched no files 2018-09-18 > > 22:29:08,798 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > config/stack/default/ceph/cluster.yml matched no files 2018-09-18 > > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499] > > cluster/*.sls matched no files 2018-09-18 22:29:08,798 > > [salt.loaded.ext.runners.validate][WARNING ][8499] > > stack/default/ceph/minions/*.yml matched no files 2018-09-18 > > 22:29:08,822 > > [salt.state ][ERROR ][8499] No highstate or sls specified, no > > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR > > ][5672] Exception occurred while handling stream: [Errno 0] Success > > 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No > > highstate or sls specified, no execution made 2018-09-18 22:30:53,629 > > [salt.state ][ERROR ][9272] No highstate or sls specified, no > > execution made > > There?s also some issue with the salt-minion.service: > > ? salt-minion.service - The Salt Minion > > > > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; > > enabled; > > > > vendor preset: disabled) > > Active: active (running) since Tue 2018-09-18 > > > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion) > > ? > > ..... > > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion. > > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function > > cephimages.list in mine_functions failed to execute > > > > > > > > > > _______________________________________________ > Deepsea-users mailing list > Deepsea-users at lists.suse.com > http://lists.suse.com/mailman/listinfo/deepsea-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.ayres at suse.com Fri Sep 21 12:18:06 2018 From: kevin.ayres at suse.com (Kevin Ayres) Date: Fri, 21 Sep 2018 18:18:06 +0000 Subject: [Deepsea-users] Stage 2 error on Azure Message-ID: I redeployed this SES 5 cluster and ripped off all of the Azure naming junk and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and hosts and .ssh/ files appear to be correct throughout. Stages 0, 1 complete. Stage 2 fails. I?m troubleshooting this stage 2 error. Something to do with keyring caching and possibly the manager role running on admin node? I?ve restarted services, node, etc. It seems to be minion issue on the salt master (adm) node. Any guidance is appreciated. Googling madly.. salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' salt:~ # ls -a /srv/salt/ceph/mgr/cache/ . .. No such folder.. salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg ... role-mgr/cluster/mon*.sls salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls roles: - mgr Thank you! ~ Kevin Complete sage 2 output: salt:~ # salt-run state.orch ceph.stage.configure deepsea_minions : valid yaml_syntax : valid profiles_populated : valid public network : 172.19.20.0/24 cluster network : 172.19.20.0/24 [ERROR ] Run failed on minions: salt Failures: salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' salt_master: Name: push.proposal - Function: salt.runner - Result: Changed Started: - 17:04:46.183848 Duration: 1392.712 ms Name: refresh_pillar1 - Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration: 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key - Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration: 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started: - 17:04:50.532926 Duration: 563.858 ms ---------- ID: mgr key Function: salt.state Result: False Comment: Run failed on minions: salt Failures: salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' Started: 17:04:51.096948 Duration: 2690.64 ms Changes: Summary for salt_master ------------ Succeeded: 5 (changed=3) Failed: 1 ------------ Total states run: 6 Total run time: 7.603 s salt:~ # service salt-minion status * salt-minion.service - The Salt Minion Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion) Tasks: 6 (limit: 512) CGroup: /system.slice/salt-minion.service |-13456 /usr/bin/python /usr/bin/salt-minion |-13462 /usr/bin/python /usr/bin/salt-minion `-13465 /usr/bin/python /usr/bin/salt-minion Sep 21 13:03:37 salt salt-minion[13456]: [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep 21 13:05:57 salt salt-minion[13456]: [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]: [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep 21 14:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]: [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep 21 15:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in mine_functions failed to execute Sep 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejackson at suse.com Fri Sep 21 12:55:26 2018 From: ejackson at suse.com (Eric Jackson) Date: Fri, 21 Sep 2018 14:55:26 -0400 Subject: [Deepsea-users] Stage 2 error on Azure In-Reply-To: References: Message-ID: <2179785.o9lRQUWKH7@fury.home> Hi Kevin, Check your minion names. Try 'salt-key -L'. The reason for the "Conflicting ID" is that Salt will unroll a Jinja loop. For example, if you have three minions assigned the mgr role, then the preprocessing of /srv/salt/ceph/mgr/ key/default.sls will create three separate stanzas. Yaml requires unique identifiers. The key file names you expected in /srv/salt/ceph/mgr/cache would be minion1.keyring, minion2.keyring and minon3.keyring. However, you are getting localhost.keyring. So, you have at least two and likely three minions all replying to the Salt master that they are "localhost". Check on each of your salt minions the value in /etc/salt/minion_id. If that is incorrect (and says "localhost"), delete the minion from the Salt master, correct the minion_id file, restart the salt minion and then accept the key on the Salt master. The commands would be admin# salt-key -d ID minion1# vi /etc/salt/minion_id minion1# systemctl restart salt-minion admin# salt-key -A Once that is resolved, you can run just the ceph.mgr.key step to verify. admin# salt 'admin*' state.apply ceph.mgr.key When that works, try Stage 2 again. Eric On Friday, September 21, 2018 2:18:06 PM EDT Kevin Ayres wrote: > I redeployed this SES 5 cluster and ripped off all of the Azure naming junk > and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and > hosts and .ssh/ files appear to be correct throughout. Stages 0, 1 > complete. Stage 2 fails. > I?m troubleshooting this stage 2 error. Something to do with keyring caching > and possibly the manager role running on admin node? I?ve restarted > services, node, etc. It seems to be minion issue on the salt master (adm) > node. Any guidance is appreciated. Googling madly.. > > salt: > > Data failed to compile: > > ---------- > > Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID > '/srv/salt/ceph/mgr/cache/localhost.keyring' > salt:~ # ls -a /srv/salt/ceph/mgr/cache/ > . .. > No such folder.. > > salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg > ... > role-mgr/cluster/mon*.sls > > salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls > roles: > - mgr > > Thank you! > ~ Kevin > > Complete sage 2 output: > > salt:~ # salt-run state.orch ceph.stage.configure > deepsea_minions : valid > yaml_syntax : valid > profiles_populated : valid > public network : 172.19.20.0/24 > cluster network : 172.19.20.0/24 > [ERROR ] Run failed on minions: salt > Failures: > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' failed: > Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > salt_master: > Name: push.proposal - Function: salt.runner - Result: Changed > Started: - 17:04:46.183848 Duration: 1392.712 ms Name: refresh_pillar1 - > Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration: > 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean > Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key - > Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration: > 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started: > - 17:04:50.532926 Duration: 563.858 ms ---------- > ID: mgr key > Function: salt.state > Result: False > Comment: Run failed on minions: salt > Failures: > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' > failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > Started: 17:04:51.096948 > Duration: 2690.64 ms > Changes: > > Summary for salt_master > ------------ > Succeeded: 5 (changed=3) > Failed: 1 > ------------ > Total states run: 6 > Total run time: 7.603 s > > salt:~ # service salt-minion status > * salt-minion.service - The Salt Minion > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; > enabled; vendor preset: disabled) Active: active (running) since Thu > 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion) > Tasks: 6 (limit: 512) > CGroup: /system.slice/salt-minion.service > > |-13456 /usr/bin/python /usr/bin/salt-minion > |-13462 /usr/bin/python /usr/bin/salt-minion > > `-13465 /usr/bin/python /usr/bin/salt-minion > > Sep 21 13:03:37 salt salt-minion[13456]: [ERROR ] Exception > occurred while handling stream: [Errno 0] Success Sep 21 13:05:57 salt > salt-minion[13456]: [ERROR ] Exception occurred while handling stream: > [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR ] > Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44 > salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]: > [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep > 21 14:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]: > [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep > 21 15:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]: > [ERROR ] Function cephimages.list in mine_functions failed to execute Sep > 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS > 'base:ceph.mgr.key.default' failed: Conflicting ID > '/srv/salt/ceph/mgr/cache/localhost.keyring' -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From kevin.ayres at suse.com Sat Sep 22 23:26:35 2018 From: kevin.ayres at suse.com (Kevin Ayres) Date: Sun, 23 Sep 2018 05:26:35 +0000 Subject: [Deepsea-users] Stage 2 error on Azure In-Reply-To: <2179785.o9lRQUWKH7@fury.home> References: <2179785.o9lRQUWKH7@fury.home> Message-ID: <2BD5F895-B84E-4A19-84A3-A853E1B692A6@suse.com> Thanks Eric. No joy, I'm missing something here.. salt:~ # salt-key -L Accepted Keys: igw1 mon1 mon2 mon3 osd1 osd2 osd3 salt salt:~ # ls -a /srv/salt/ceph/mgr/cache . .. salt:~ # salt-key -d salt The following keys are going to be deleted: Accepted Keys: salt salt:~ # vi /etc/salt/minion_id salt:~ # salt-key -A The following keys are going to be accepted: Unaccepted Keys: salt Proceed? [n/Y] y Key for minion salt accepted. salt:~ # salt 'admin*' state.apply ceph.mgr.key No minions matched the target. No command was sent, no jid was assigned. ERROR: No return received salt:~ # salt 'salt*' state.apply ceph.mgr.key salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' ERROR: Minions returned with non-zero exit code salt:~ # salt-run state.orch ceph.stage.configure deepsea_minions : valid yaml_syntax : valid profiles_populated : valid public network : 172.19.20.0/24 cluster network : 172.19.20.0/24 [ERROR ] Run failed on minions: salt Failures: salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' salt_master: Name: push.proposal - Function: salt.runner - Result: Changed Started: - 05:22:09.015323 Duration: 1330.864 ms Name: refresh_pillar1 - Function: salt.state - Result: Changed Started: - 05:22:10.346301 Duration: 759.065 ms Name: advise.networks - Function: salt.runner - Result: Clean Started: - 05:22:11.105561 Duration: 1699.562 ms Name: admin key - Function: salt.state - Result: Clean Started: - 05:22:12.805243 Duration: 391.792 ms Name: mon key - Function: salt.state - Result: Changed Started: - 05:22:13.197149 Duration: 381.892 ms ---------- ID: mgr key Function: salt.state Result: False Comment: Run failed on minions: salt Failures: salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' Started: 05:22:13.579153 Duration: 2510.463 ms Changes: Summary for salt_master ------------ Succeeded: 5 (changed=3) Failed: 1 ------------ Total states run: 6 Total run time: 7.074 s ~ Kevin ?On 9/21/18, 11:56 AM, "deepsea-users-bounces at lists.suse.com on behalf of Eric Jackson" wrote: Hi Kevin, Check your minion names. Try 'salt-key -L'. The reason for the "Conflicting ID" is that Salt will unroll a Jinja loop. For example, if you have three minions assigned the mgr role, then the preprocessing of /srv/salt/ceph/mgr/ key/default.sls will create three separate stanzas. Yaml requires unique identifiers. The key file names you expected in /srv/salt/ceph/mgr/cache would be minion1.keyring, minion2.keyring and minon3.keyring. However, you are getting localhost.keyring. So, you have at least two and likely three minions all replying to the Salt master that they are "localhost". Check on each of your salt minions the value in /etc/salt/minion_id. If that is incorrect (and says "localhost"), delete the minion from the Salt master, correct the minion_id file, restart the salt minion and then accept the key on the Salt master. The commands would be admin# salt-key -d ID minion1# vi /etc/salt/minion_id minion1# admin# salt-key -A Once that is resolved, you can run just the ceph.mgr.key step to verify. admin# salt 'admin*' state.apply ceph.mgr.key When that works, try Stage 2 again. Eric On Friday, September 21, 2018 2:18:06 PM EDT Kevin Ayres wrote: > I redeployed this SES 5 cluster and ripped off all of the Azure naming junk > and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and > hosts and .ssh/ files appear to be correct throughout. Stages 0, 1 > complete. Stage 2 fails. > I?m troubleshooting this stage 2 error. Something to do with keyring caching > and possibly the manager role running on admin node? I?ve restarted > services, node, etc. It seems to be minion issue on the salt master (adm) > node. Any guidance is appreciated. Googling madly.. > > salt: > > Data failed to compile: > > ---------- > > Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID > '/srv/salt/ceph/mgr/cache/localhost.keyring' > salt:~ # ls -a /srv/salt/ceph/mgr/cache/ > . .. > No such folder.. > > salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg > ... > role-mgr/cluster/mon*.sls > > salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls > roles: > - mgr > > Thank you! > ~ Kevin > > Complete sage 2 output: > > salt:~ # salt-run state.orch ceph.stage.configure > deepsea_minions : valid > yaml_syntax : valid > profiles_populated : valid > public network : 172.19.20.0/24 > cluster network : 172.19.20.0/24 > [ERROR ] Run failed on minions: salt > Failures: > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' failed: > Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > salt_master: > Name: push.proposal - Function: salt.runner - Result: Changed > Started: - 17:04:46.183848 Duration: 1392.712 ms Name: refresh_pillar1 - > Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration: > 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean > Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key - > Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration: > 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started: > - 17:04:50.532926 Duration: 563.858 ms ---------- > ID: mgr key > Function: salt.state > Result: False > Comment: Run failed on minions: salt > Failures: > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' > failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > Started: 17:04:51.096948 > Duration: 2690.64 ms > Changes: > > Summary for salt_master > ------------ > Succeeded: 5 (changed=3) > Failed: 1 > ------------ > Total states run: 6 > Total run time: 7.603 s > > salt:~ # service salt-minion status > * salt-minion.service - The Salt Minion > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; > enabled; vendor preset: disabled) Active: active (running) since Thu > 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion) > Tasks: 6 (limit: 512) > CGroup: /system.slice/salt-minion.service > > |-13456 /usr/bin/python /usr/bin/salt-minion > |-13462 /usr/bin/python /usr/bin/salt-minion > > `-13465 /usr/bin/python /usr/bin/salt-minion > > Sep 21 13:03:37 salt salt-minion[13456]: [ERROR ] Exception > occurred while handling stream: [Errno 0] Success Sep 21 13:05:57 salt > salt-minion[13456]: [ERROR ] Exception occurred while handling stream: > [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR ] > Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44 > salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]: > [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep > 21 14:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]: > [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep > 21 15:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]: > [ERROR ] Function cephimages.list in mine_functions failed to execute Sep > 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS > 'base:ceph.mgr.key.default' failed: Conflicting ID > '/srv/salt/ceph/mgr/cache/localhost.keyring' From dbyte at suse.com Sat Sep 22 23:35:17 2018 From: dbyte at suse.com (David Byte) Date: Sun, 23 Sep 2018 05:35:17 +0000 Subject: [Deepsea-users] Stage 2 error on Azure In-Reply-To: <2BD5F895-B84E-4A19-84A3-A853E1B692A6@suse.com> References: <2179785.o9lRQUWKH7@fury.home> <2BD5F895-B84E-4A19-84A3-A853E1B692A6@suse.com> Message-ID: <0CF256E4-87D5-48C8-A76E-779C3D70ABB0@suse.com> I know you are going to love this, but I suspect you have stuff cached that needs to be cleaned up. The issues you are seeing are exactly things where I use my cleanit.sh script to wipe it all out and allow me to start over. David Byte Sr. Technology Strategist SCE Enterprise Linux SCE Enterprise Storage Alliances and SUSE Embedded dbyte at suse.com 918.528.4422 ?On 9/22/18, 11:26 PM, "deepsea-users-bounces at lists.suse.com on behalf of Kevin Ayres" wrote: Thanks Eric. No joy, I'm missing something here.. salt:~ # salt-key -L Accepted Keys: igw1 mon1 mon2 mon3 osd1 osd2 osd3 salt salt:~ # ls -a /srv/salt/ceph/mgr/cache . .. salt:~ # salt-key -d salt The following keys are going to be deleted: Accepted Keys: salt salt:~ # vi /etc/salt/minion_id salt:~ # salt-key -A The following keys are going to be accepted: Unaccepted Keys: salt Proceed? [n/Y] y Key for minion salt accepted. salt:~ # salt 'admin*' state.apply ceph.mgr.key No minions matched the target. No command was sent, no jid was assigned. ERROR: No return received salt:~ # salt 'salt*' state.apply ceph.mgr.key salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' ERROR: Minions returned with non-zero exit code salt:~ # salt-run state.orch ceph.stage.configure deepsea_minions : valid yaml_syntax : valid profiles_populated : valid public network : 172.19.20.0/24 cluster network : 172.19.20.0/24 [ERROR ] Run failed on minions: salt Failures: salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' salt_master: Name: push.proposal - Function: salt.runner - Result: Changed Started: - 05:22:09.015323 Duration: 1330.864 ms Name: refresh_pillar1 - Function: salt.state - Result: Changed Started: - 05:22:10.346301 Duration: 759.065 ms Name: advise.networks - Function: salt.runner - Result: Clean Started: - 05:22:11.105561 Duration: 1699.562 ms Name: admin key - Function: salt.state - Result: Clean Started: - 05:22:12.805243 Duration: 391.792 ms Name: mon key - Function: salt.state - Result: Changed Started: - 05:22:13.197149 Duration: 381.892 ms ---------- ID: mgr key Function: salt.state Result: False Comment: Run failed on minions: salt Failures: salt: Data failed to compile: ---------- Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' Started: 05:22:13.579153 Duration: 2510.463 ms Changes: Summary for salt_master ------------ Succeeded: 5 (changed=3) Failed: 1 ------------ Total states run: 6 Total run time: 7.074 s ~ Kevin On 9/21/18, 11:56 AM, "deepsea-users-bounces at lists.suse.com on behalf of Eric Jackson" wrote: Hi Kevin, Check your minion names. Try 'salt-key -L'. The reason for the "Conflicting ID" is that Salt will unroll a Jinja loop. For example, if you have three minions assigned the mgr role, then the preprocessing of /srv/salt/ceph/mgr/ key/default.sls will create three separate stanzas. Yaml requires unique identifiers. The key file names you expected in /srv/salt/ceph/mgr/cache would be minion1.keyring, minion2.keyring and minon3.keyring. However, you are getting localhost.keyring. So, you have at least two and likely three minions all replying to the Salt master that they are "localhost". Check on each of your salt minions the value in /etc/salt/minion_id. If that is incorrect (and says "localhost"), delete the minion from the Salt master, correct the minion_id file, restart the salt minion and then accept the key on the Salt master. The commands would be admin# salt-key -d ID minion1# vi /etc/salt/minion_id minion1# admin# salt-key -A Once that is resolved, you can run just the ceph.mgr.key step to verify. admin# salt 'admin*' state.apply ceph.mgr.key When that works, try Stage 2 again. Eric On Friday, September 21, 2018 2:18:06 PM EDT Kevin Ayres wrote: > I redeployed this SES 5 cluster and ripped off all of the Azure naming junk > and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and > hosts and .ssh/ files appear to be correct throughout. Stages 0, 1 > complete. Stage 2 fails. > I?m troubleshooting this stage 2 error. Something to do with keyring caching > and possibly the manager role running on admin node? I?ve restarted > services, node, etc. It seems to be minion issue on the salt master (adm) > node. Any guidance is appreciated. Googling madly.. > > salt: > > Data failed to compile: > > ---------- > > Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID > '/srv/salt/ceph/mgr/cache/localhost.keyring' > salt:~ # ls -a /srv/salt/ceph/mgr/cache/ > . .. > No such folder.. > > salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg > ... > role-mgr/cluster/mon*.sls > > salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls > roles: > - mgr > > Thank you! > ~ Kevin > > Complete sage 2 output: > > salt:~ # salt-run state.orch ceph.stage.configure > deepsea_minions : valid > yaml_syntax : valid > profiles_populated : valid > public network : 172.19.20.0/24 > cluster network : 172.19.20.0/24 > [ERROR ] Run failed on minions: salt > Failures: > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' failed: > Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > salt_master: > Name: push.proposal - Function: salt.runner - Result: Changed > Started: - 17:04:46.183848 Duration: 1392.712 ms Name: refresh_pillar1 - > Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration: > 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean > Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key - > Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration: > 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started: > - 17:04:50.532926 Duration: 563.858 ms ---------- > ID: mgr key > Function: salt.state > Result: False > Comment: Run failed on minions: salt > Failures: > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' > failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > Started: 17:04:51.096948 > Duration: 2690.64 ms > Changes: > > Summary for salt_master > ------------ > Succeeded: 5 (changed=3) > Failed: 1 > ------------ > Total states run: 6 > Total run time: 7.603 s > > salt:~ # service salt-minion status > * salt-minion.service - The Salt Minion > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; > enabled; vendor preset: disabled) Active: active (running) since Thu > 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion) > Tasks: 6 (limit: 512) > CGroup: /system.slice/salt-minion.service > > |-13456 /usr/bin/python /usr/bin/salt-minion > |-13462 /usr/bin/python /usr/bin/salt-minion > > `-13465 /usr/bin/python /usr/bin/salt-minion > > Sep 21 13:03:37 salt salt-minion[13456]: [ERROR ] Exception > occurred while handling stream: [Errno 0] Success Sep 21 13:05:57 salt > salt-minion[13456]: [ERROR ] Exception occurred while handling stream: > [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR ] > Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44 > salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]: > [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep > 21 14:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]: > [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep > 21 15:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in > mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]: > [ERROR ] Function cephimages.list in mine_functions failed to execute Sep > 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS > 'base:ceph.mgr.key.default' failed: Conflicting ID > '/srv/salt/ceph/mgr/cache/localhost.keyring' _______________________________________________ Deepsea-users mailing list Deepsea-users at lists.suse.com http://lists.suse.com/mailman/listinfo/deepsea-users From tserong at suse.com Sun Sep 23 23:23:58 2018 From: tserong at suse.com (Tim Serong) Date: Mon, 24 Sep 2018 15:23:58 +1000 Subject: [Deepsea-users] Stage 2 error on Azure In-Reply-To: <0CF256E4-87D5-48C8-A76E-779C3D70ABB0@suse.com> References: <2179785.o9lRQUWKH7@fury.home> <2BD5F895-B84E-4A19-84A3-A853E1B692A6@suse.com> <0CF256E4-87D5-48C8-A76E-779C3D70ABB0@suse.com> Message-ID: <4b5517a2-1224-8c38-9b09-4e7805062870@suse.com> Have a look in /etc/hosts, and make sure the system hostname does *not* appear on either the IPv4 or IPv6 localhost lines. That was the problem last time I saw this issue (presumably something was resolving the hostname, getting 127.0.0.1 or ::1 back, then doing a reverse lookup on that address, and finishing up with localhost). Regards, Tim On 09/23/2018 03:35 PM, David Byte wrote: > I know you are going to love this, but I suspect you have stuff cached that needs to be cleaned up. The issues you are seeing are exactly things where I use my cleanit.sh script to wipe it all out and allow me to start over. > > > David Byte > Sr. Technology Strategist > SCE Enterprise Linux > SCE Enterprise Storage > Alliances and SUSE Embedded > dbyte at suse.com > 918.528.4422 > > ?On 9/22/18, 11:26 PM, "deepsea-users-bounces at lists.suse.com on behalf of Kevin Ayres" wrote: > > Thanks Eric. No joy, I'm missing something here.. > > salt:~ # salt-key -L > Accepted Keys: > igw1 > mon1 > mon2 > mon3 > osd1 > osd2 > osd3 > salt > > salt:~ # ls -a /srv/salt/ceph/mgr/cache > . .. > > salt:~ # salt-key -d salt > The following keys are going to be deleted: > Accepted Keys: > salt > > salt:~ # vi /etc/salt/minion_id > > salt:~ # salt-key -A > The following keys are going to be accepted: > Unaccepted Keys: > salt > Proceed? [n/Y] y > Key for minion salt accepted. > > salt:~ # salt 'admin*' state.apply ceph.mgr.key > No minions matched the target. No command was sent, no jid was assigned. > ERROR: No return received > > salt:~ # salt 'salt*' state.apply ceph.mgr.key > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > ERROR: Minions returned with non-zero exit code > > salt:~ # salt-run state.orch ceph.stage.configure > deepsea_minions : valid > yaml_syntax : valid > profiles_populated : valid > public network : 172.19.20.0/24 > cluster network : 172.19.20.0/24 > [ERROR ] Run failed on minions: salt > Failures: > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > > salt_master: > Name: push.proposal - Function: salt.runner - Result: Changed Started: - 05:22:09.015323 Duration: 1330.864 ms > Name: refresh_pillar1 - Function: salt.state - Result: Changed Started: - 05:22:10.346301 Duration: 759.065 ms > Name: advise.networks - Function: salt.runner - Result: Clean Started: - 05:22:11.105561 Duration: 1699.562 ms > Name: admin key - Function: salt.state - Result: Clean Started: - 05:22:12.805243 Duration: 391.792 ms > Name: mon key - Function: salt.state - Result: Changed Started: - 05:22:13.197149 Duration: 381.892 ms > ---------- > ID: mgr key > Function: salt.state > Result: False > Comment: Run failed on minions: salt > Failures: > salt: > Data failed to compile: > ---------- > Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > Started: 05:22:13.579153 > Duration: 2510.463 ms > Changes: > > Summary for salt_master > ------------ > Succeeded: 5 (changed=3) > Failed: 1 > ------------ > Total states run: 6 > Total run time: 7.074 s > > ~ Kevin > > On 9/21/18, 11:56 AM, "deepsea-users-bounces at lists.suse.com on behalf of Eric Jackson" wrote: > > Hi Kevin, > Check your minion names. Try 'salt-key -L'. The reason for the "Conflicting > ID" is that Salt will unroll a Jinja loop. For example, if you have three > minions assigned the mgr role, then the preprocessing of /srv/salt/ceph/mgr/ > key/default.sls will create three separate stanzas. Yaml requires unique > identifiers. > The key file names you expected in /srv/salt/ceph/mgr/cache would be > minion1.keyring, minion2.keyring and minon3.keyring. However, you are getting > localhost.keyring. So, you have at least two and likely three minions all > replying to the Salt master that they are "localhost". > > Check on each of your salt minions the value in /etc/salt/minion_id. If > that is incorrect (and says "localhost"), delete the minion from the Salt > master, correct the minion_id file, restart the salt minion and then accept the > key on the Salt master. The commands would be > > admin# salt-key -d ID > minion1# vi /etc/salt/minion_id > minion1# > admin# salt-key -A > > Once that is resolved, you can run just the ceph.mgr.key step to verify. > > admin# salt 'admin*' state.apply ceph.mgr.key > > When that works, try Stage 2 again. > > Eric > > On Friday, September 21, 2018 2:18:06 PM EDT Kevin Ayres wrote: > > I redeployed this SES 5 cluster and ripped off all of the Azure naming junk > > and moved a local DNS server on the Adm node. So, DNS, NTP, resolv.conf and > > hosts and .ssh/ files appear to be correct throughout. Stages 0, 1 > > complete. Stage 2 fails. > > > I?m troubleshooting this stage 2 error. Something to do with keyring caching > > and possibly the manager role running on admin node? I?ve restarted > > services, node, etc. It seems to be minion issue on the salt master (adm) > > node. Any guidance is appreciated. Googling madly.. > > > > > salt: > > > > Data failed to compile: > > > > ---------- > > > > Rendering SLS 'base:ceph.mgr.key.default' failed: Conflicting ID > > '/srv/salt/ceph/mgr/cache/localhost.keyring' > > > salt:~ # ls -a /srv/salt/ceph/mgr/cache/ > > . .. > > No such folder.. > > > > salt:~ # cat /srv/pillar/ceph/proposals/policy.cfg > > ... > > role-mgr/cluster/mon*.sls > > > > salt:~ # cat /srv/pillar/ceph/proposals/role-mgr/cluster/salt.sls > > roles: > > - mgr > > > > Thank you! > > ~ Kevin > > > > Complete sage 2 output: > > > > salt:~ # salt-run state.orch ceph.stage.configure > > deepsea_minions : valid > > yaml_syntax : valid > > profiles_populated : valid > > public network : 172.19.20.0/24 > > cluster network : 172.19.20.0/24 > > [ERROR ] Run failed on minions: salt > > Failures: > > salt: > > Data failed to compile: > > ---------- > > Rendering SLS 'base:ceph.mgr.key.default' failed: > > Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > > > salt_master: > > Name: push.proposal - Function: salt.runner - Result: Changed > > Started: - 17:04:46.183848 Duration: 1392.712 ms > Name: refresh_pillar1 - > > Function: salt.state - Result: Changed Started: - 17:04:47.576691 Duration: > > 784.477 ms Name: advise.networks - Function: salt.runner - Result: Clean > > Started: - 17:04:48.361300 Duration: 1766.973 ms Name: admin key - > > Function: salt.state - Result: Clean Started: - 17:04:50.128391 Duration: > > 404.333 ms Name: mon key - Function: salt.state - Result: Changed Started: > > - 17:04:50.532926 Duration: 563.858 ms ---------- > > ID: mgr key > > Function: salt.state > > Result: False > > Comment: Run failed on minions: salt > > Failures: > > salt: > > Data failed to compile: > > ---------- > > Rendering SLS 'base:ceph.mgr.key.default' > > failed: Conflicting ID '/srv/salt/ceph/mgr/cache/localhost.keyring' > > Started: 17:04:51.096948 > > Duration: 2690.64 ms > > Changes: > > > > Summary for salt_master > > ------------ > > Succeeded: 5 (changed=3) > > Failed: 1 > > ------------ > > Total states run: 6 > > Total run time: 7.603 s > > > > salt:~ # service salt-minion status > > * salt-minion.service - The Salt Minion > > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; > > enabled; vendor preset: disabled) > Active: active (running) since Thu > > 2018-09-20 23:53:42 UTC; 17h ago Main PID: 13456 (salt-minion) > > Tasks: 6 (limit: 512) > > CGroup: /system.slice/salt-minion.service > > > > |-13456 /usr/bin/python /usr/bin/salt-minion > > |-13462 /usr/bin/python /usr/bin/salt-minion > > > > `-13465 /usr/bin/python /usr/bin/salt-minion > > > > Sep 21 13:03:37 salt salt-minion[13456]: [ERROR ] Exception > > occurred while handling stream: [Errno 0] Success > Sep 21 13:05:57 salt > > salt-minion[13456]: [ERROR ] Exception occurred while handling stream: > > [Errno 0] Success Sep 21 13:50:19 salt salt-minion[13456]: [ERROR ] > > Exception occurred while handling stream: [Errno 0] Success Sep 21 13:53:44 > > salt salt-minion[13456]: [ERROR ] Function cephimages.list in > > mine_functions failed to execute Sep 21 14:47:32 salt salt-minion[13456]: > > [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep > > 21 14:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in > > mine_functions failed to execute Sep 21 15:05:03 salt salt-minion[13456]: > > [ERROR ] Exception occurred while handling stream: [Errno 0] Success Sep > > 21 15:53:44 salt salt-minion[13456]: [ERROR ] Function cephimages.list in > > mine_functions failed to execute Sep 21 16:53:44 salt salt-minion[13456]: > > [ERROR ] Function cephimages.list in mine_functions failed to execute Sep > > 21 17:04:53 salt salt-minion[13456]: [CRITICAL] Rendering SLS > > 'base:ceph.mgr.key.default' failed: Conflicting ID > > '/srv/salt/ceph/mgr/cache/localhost.keyring' > > > > _______________________________________________ > Deepsea-users mailing list > Deepsea-users at lists.suse.com > http://lists.suse.com/mailman/listinfo/deepsea-users > > > _______________________________________________ > Deepsea-users mailing list > Deepsea-users at lists.suse.com > http://lists.suse.com/mailman/listinfo/deepsea-users > -- Tim Serong Senior Clustering Engineer SUSE tserong at suse.com