[Deepsea-users] stage 1 errors on Azure

Wed Sep 19 16:03:05 MDT 2018

Thanks Eric! I understand. Couldn’t find localhost – DOH moment. Salt-master restarted fine each time without errors but api was failing.

salt:~ # salt-run validate.saltapi

salt-api                 : ["Salt API is failing to authenticate - try 'systemctl restart salt-master': list index out of range"]

False

Localhost was missing due to the heavy /etc/hosts modifications I made for Azure instance resolution.

I just appended localhost into “127.0.0.1       salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt”

salt:~ # salt-run validate.saltapi
salt-api                 : valid

salt:~ # curl -si localhost:8000/login -H "Accept: application/json" -d username=admin -d sharedsecret=fbc39dd2-2bba-42ec-ab9b-7d9e71b84047 -d eauth=sharedsecret
HTTP/1.1 200 OK
Content-Length: 204
Access-Control-Expose-Headers: GET, POST
Vary: Accept-Encoding
Server: CherryPy/3.6.0
Allow: GET, HEAD, POST
Access-Control-Allow-Credentials: true
Date: Wed, 19 Sep 2018 21:39:28 GMT
Access-Control-Allow-Origin: *
X-Auth-Token: 640bb306bd8fb202ef71757aac83f0db9beb4e11
Content-Type: application/json
Set-Cookie: session_id=640bb306bd8fb202ef71757aac83f0db9beb4e11; expires=Thu, 20 Sep 2018 07:39:28 GMT; Path=/

{"return": [{"perms": [".*", "@runner", "@wheel"], "start": 1537393168.443645, "token": "640bb306bd8fb202ef71757aac83f0db9beb4e11", "expire": 1537436368.443646, "user": "admin", "eauth": "sharedsecret"}]}salt:~ #

Now Stage 1 runs through.
salt:~ # salt-run state.orch ceph.stage.discovery
salt-api                 : valid
deepsea_minions          : valid
master_minion            : valid
ceph_version             : valid
[WARNING ] All minions are ready
{}
salt_master:
  Name: minions.ready - Function: salt.runner - Result: Changed Started: - 21:59:25.186357 Duration: 1528.309 ms
  Name: refresh_pillar0 - Function: salt.state - Result: Changed Started: - 21:59:26.714801 Duration: 340.017 ms
  Name: populate.proposals - Function: salt.runner - Result: Changed Started: - 21:59:27.055255 Duration: 5107.852 ms
  Name: proposal.populate - Function: salt.runner - Result: Changed Started: - 21:59:32.163281 Duration: 2578.835 ms

Summary for salt_master
------------
Succeeded: 4 (changed=4)
Failed:    0
------------
Total states run:     4
Total run time:   9.555 s

And Proposals exist.

salt:~ # ls /srv/pillar/ceph/proposals/

cluster-ceph      config        role-admin            role-client-cephfs  role-client-nfs         role-ganesha       role-master  role-mgr  role-openattic

cluster-unassigned  profile-default  role-benchmark-rbd  role-client-iscsi   role-client-radosgw  role-igw   role-mds     role-mon  role-rgw

Thanks again Eric!

~ Kevin

From: <deepsea-users-bounces at lists.suse.com> on behalf of Eric Jackson <ejackson at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Wednesday, September 19, 2018 at 1:39 PM
To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] stage 1 errors on Azure

The rpm would be installed after Salt is configured. I understand some installations install both Salt and DeepSea via YaST. We did try to make accommodations for that scenario.

So, your salt-api is still down. We ran into drastically different reasons why the salt-api can fail. We did not have the bandwidth to address each type of failure. The check we do is a curl command to verify that the salt-api is answering.

curl -si localhost:8000/login -H "Accept: application/json"
  -d username=admin -d sharedsecret=xxx -d eauth=sharedsecret

The sharedsecret is in /etc/salt/master/sharedsecret.conf. It's possible to have the Salt master remember the previous contents of that file and the Salt api to use the current contents if an admin does things just right :) . That's why we give the error message about restarting the Salt master.

However, if the above curl command fails because the Salt-api is down or maybe localhost is not defined in /etc/hosts (ran into that once). The curl command may shed more light on the failure.

***

As far as how would you have found that curl command. Take a look at the contents of /srv/modules/runners/validate.py. About line 626, you will see the python code is literally calling the curl command. In other words, do not be intimidated about the python code. Much of it is calling some of the same command line tools that you would use. We are just using Salt to do much of this in parallel.

It's also possible to run this directly without invoking Stage 1.

# salt-run validate.saltapi

Although the validations can be frustrating, not having them is worse. The situations where we did not check for Salt api lead to incredibly painful debug sessions.

Eric

On Wednesday, September 19, 2018 3:56:17 PM EDT Kevin Ayres wrote:

> Thanks Eric, Yes, I understand this but worded it poorly. I don't see any

> issues with NTP or DNS. Something else is amiss.

Should deepsea be

> installed after salt as outlined in the deployment doc, or before?

> salt:~ # salt-run state.orch ceph.stage.discovery

> salt-api : ["Salt API is failing to authenticate - try

> 'systemctl restart salt-master': list index out of range"]

deepsea_minions

> : valid

> master_minion : valid

> ceph_version : valid

> [ERROR ] No highstate or sls specified, no execution made

> salt_master:

> ----------

> ID: salt-api failed

> Function: salt.state

> Name: just.exit

> Result: False

> Comment: No highstate or sls specified, no execution made

> Started: 19:38:41.962044

> Duration: 0.734 ms

> Changes:

>

> Summary for salt_master

> ------------

> Succeeded: 0

> Failed: 1

> ------------

> Total states run: 1

> Total run time: 0.734 ms

>

>

> salt:~ # tail -f /var/log/salt/master

> 2018-09-19 18:44:36,555 [salt.loaded.ext.runners.minions][WARNING ][15319]

> All minions are ready

2018-09-19 19:38:41,955 [salt.transport.ipc][ERROR

> ][1626] Exception occurred while handling stream: [Errno 0] Success

> 2018-09-19 19:38:41,962 [salt.state ][ERROR ][40826] No highstate

> or sls specified, no execution made

> salt:~ # ls /srv/pillar/ceph/proposals

> ls: cannot access '/srv/pillar/ceph/proposals': No such file or directory

>

> salt:~ # ls /srv/pillar/ceph/

> benchmarks deepsea_minions.sls deepsea_minions.sls.rpmsave

> init.sls master_minion.sls master_minion.sls.rpmsave stack

>

> ~ Kevin

>

> On 9/19/18, 12:37 PM, "deepsea-users-bounces at lists.suse.com on behalf of

> Eric Jackson" <deepsea-users-bounces at lists.suse.com on behalf of

> ejackson at suse.com> wrote:

> Hi Kevin,

> Stage 0 only does the "preparation" part. That is, sync'ing salt

> modules,

zypper updates, etc. Stage 1 is the "discovery" part that

> interrogates the minions and then creates the roles and storage fragments.

> If your salt-api issue is resolved, Stage 1 should run relatively quick.

>

> Eric

>

> On Wednesday, September 19, 2018 3:20:43 PM EDT Kevin Ayres wrote:

>

> > Thanks Joel, yes DNS, NTP is configured and behaving correctly.

> > SP3/SES5

> > from current repo. salt-api service, master, minion service running

> > (with

> > one error.)

>

> I’m walking through the Deployment guide line by line with

>

> > same result, now on my second freshly built master node. Salt output

> > is at

> > the bottom of this message. Key: After stage 0, the */proposals

> > directory

> > has NOT been created.

> > Here’s my build on a single flat network(Azure vNet 172.19.20.0/24):

> > Root ssh enabled and key based login from master to all nodes as root.

> > All

> > nodes rebooted before salt stage.

>

> All nodes using identical image and

>

> > fully patched CPE_NAME="cpe:/o:suse:sles:12:sp3", firewall off, etc. -

> > the

> > Azure instance defaults.

> > Salt (and all nodes):~ # zypper lr -E

> > Repository priorities are without effect. All enabled repositories

> > share the

same priority.

>

> # | Alias

>

> > | Name | Enabled | GPG Check

> > | |

> >

> > Refresh

> > ---+------------------------------------------------------------------

> > --+--

> > ---------------------------------+---------+-----------+-------- 3 |

> > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Pool |

> > SUSE-Enterprise-Storage-5-Pool | Yes | (r ) Yes | No 5 |

> > SUSE_Enterprise_Storage_5_x86_64:SUSE-Enterprise-Storage-5-Updates |

> > SUSE-Enterprise-Storage-5-Updates | Yes | (r ) Yes | Yes 8 |

> > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Pool |

> > SLES12-SP3-Pool | Yes | (r ) Yes | No 10 |

> > SUSE_Linux_Enterprise_Server_12_SP3_x86_64:SLES12-SP3-Updates |

> > SLES12-SP3-Updates | Yes | (r ) Yes | Yes

> > **DNS** all nodes resolve bidirectionally. Azure cares for DNS but

> > I’ve also

updated hosts files.

>

> salt:~ # hostname

>

> > salt

> > salt:~ # ping salt

> > PING salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > (127.0.0.1)

> > 56(84) bytes of data.

>

> 64 bytes from

>

> > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net (127.0.0.1):

> > icmp_seq=1 ttl=64 time=0.030 ms

> > 104.211.27.224 Outside NAT to 172.19.20.10

> > salt.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net salt

>

> 172.19.20.12

>

> > mon1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon1

> >

> > 172.19.20.13

> > mon2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > mon2 172.19.20.14

> > mon3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net mon3

> > 172.19.20.15

>

> > osd1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd1

> >

> > 172.19.20.16

> > osd2.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net

> > osd2 172.19.20.17

> > osd3.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net osd3

> > 172.19.20.18

>

> > igw1.acylew2ti3nulm1e5a1hcxdv0h.bx.internal.cloudapp.net ogw1

> >

> > **NTP** all nodes 5 minutes sync interval to same Stratum 1 server in

> > same

> > GEO as Azure AZ: (US East) navobs1.gatech.edu as shown:

>

> bash-3.2$ pssh -h

>

> > pssh-hosts -l sesuser -i sudo ntpq -p

> > [1] 11:15:27 [SUCCESS] mon3

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 19 64 1

> > 15.596 -4.863 0.333 [2] 11:15:27 [SUCCESS] salt

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 42 64 1

> > 17.063 -6.702 0.000 [3] 11:15:27 [SUCCESS] igw1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 18 64 1

> > 17.394 -27.874 7.663 [4] 11:15:27 [SUCCESS] osd1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 21 64 1

> > 16.962 -3.755 0.813 [5] 11:15:27 [SUCCESS] osd2

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 22 64 1

> > 15.832 -4.709 3.062 [6] 11:15:27 [SUCCESS] osd3

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== *navobs1.gatech. .GPS. 1 u 26 64 1

> > 15.877 -3.252 19.131 [7] 11:15:27 [SUCCESS] mon1

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 2 64 1

> > 16.120 -4.263 0.000 [8] 11:15:27 [SUCCESS] mon2

> >

> > remote refid st t when poll reach delay offset

> >

> >

> > jitter

> > ======================================================================

> > =====

=== navobs1.gatech. .GPS. 1 u 2 64 1

> > 16.108 -7.713 0.959

> >

> > **SALT**

> > salt:~ # systemctl status salt-api salt-master salt-minion |grep

> > 'active

> > (running)'

>

> Active: active (running) since Wed 2018-09-19 18:14:39 UTC;

>

> > 13min ago Active: active (running) since Wed 2018-09-19 18:14:42 UTC;

> > 13min

ago Active: active (running) since Wed 2018-09-19 18:14:41

> > UTC; 13min ago salt:~ # systemctl status salt-api salt-master

> > salt-minion |grep ERROR >

> > Sep 19 18:14:49 salt salt-minion[1413]: [ERROR ] Function

> >

> > cephimages.list in mine_functions failed to execute

>

>

>

> > salt:~ # salt-key --list-all

> >

> > Accepted Keys:

> > igw1

> > mon1

> > mon2

> > mon3

> > osd1

> > osd2

> > osd3

> > salt

> > Denied Keys:

> > Unaccepted Keys:

> > Rejected Keys:

> >

> >

> > salt:~ # salt '*' test.ping

> > salt:

> >

> > True

> >

> > osd2:

> >

> > True

> >

> > mon3:

> >

> > True

> >

> > osd3:

> >

> > True

> >

> > osd1:

> >

> > True

> >

> > mon2:

> >

> > True

> >

> > igw1:

> >

> > True

> >

> > mon1:

> >

> > True

> >

> >

> > salt:~ # cat /srv/pillar/ceph/master_minion.sls

> > master_minion: salt

> >

> > salt:~ # cat /srv/pillar/ceph/deepsea_minions.sls

> > ...

> > # Choose all minions

> > deepsea_minions: '*'

> > ...

> >

> > **SALT STAGES**

> > Stage 0 is successful with no errors but does not create the

> > proposals

> > folder.

>

>

>

> > salt:~ # salt-run state.orch ceph.stage.prep

> >

> > deepsea_minions : valid

> > master_minion : valid

> > ceph_version : valid

> > [WARNING ] All minions are ready

> > salt_master:

> >

> > Name: sync master - Function: salt.state - Result: Changed

> > Started: -

> >

> > 18:44:20.440255 Duration: 949.98 ms

>

> Name: salt-api - Function: salt.state

>

> > - Result: Changed Started: - 18:44:21.390365 Duration: 3256.749 ms

> > Name:

> > repo master - Function: salt.state - Result: Clean Started: -

> > 18:44:24.647227 Duration: 351.0 ms Name: metapackage master -

> > Function:

> > salt.state - Result: Clean Started: - 18:44:24.998333 Duration:

> > 1127.063 ms

Name: prepare master - Function: salt.state - Result:

> > Changed Started: - 18:44:26.125514 Duration: 4109.917 ms Name:

> > filequeue.remove - Function: salt.runner - Result: Changed Started: -

> > 18:44:30.235610 Duration: 2071.199 ms Name: restart master -

> > Function: salt.state - Result: Clean Started: - 18:44:32.306972

> > Duration: 1006.268 ms Name: filequeue.add - Function: salt.runner -

> > Result: Changed Started: - 18:44:33.313369 Duration: 1352.98 ms Name:

> > minions.ready - Function: salt.runner - Result: Changed Started: -

> > 18:44:34.666528 Duration: 1891.677 ms Name: repo - Function:

> > salt.state - Result: Clean Started: - 18:44:36.558363 Duration:

> > 553.342 ms Name: metapackage minions - Function: salt.state - Result:

> > Clean Started: - 18:44:37.111825 Duration: 3993.733 ms Name: common

> > packages - Function: salt.state - Result: Clean Started: -

> > 18:44:41.105706 Duration: 2434.079 ms Name: sync - Function:

> > salt.state - Result: Changed Started: -

> > 18:44:43.539897 Duration: 1381.692 ms Name: mines - Function:

> > salt.state -

> > Result: Clean Started: - 18:44:44.921708 Duration: 1657.019 ms Name:

> > updates - Function: salt.state - Result: Changed Started: -

> > 18:44:46.578853

Duration: 11183.347 ms Name: restart - Function:

> > salt.state - Result: Clean Started: - 18:44:57.762346 Duration:

> > 1553.957 ms Name: mds restart noop - Function: test.nop - Result:

> > Clean Started: - 18:44:59.316442 Duration: 0.348 ms

> >

> > Summary for salt_master

> > -------------

> > Succeeded: 17 (changed=8)

> > Failed: 0

> > -------------

> > Total states run: 17

> > Total run time: 38.874 s

> >

> >

> >

> > Before running Stage 1, the /srv/pillar/ceph/proposals directory does

> > not

> > exist.

>

> salt:~ # ls /srv/pillar/ceph/proposals/

>

> > ls: cannot access '/srv/pillar/ceph/proposals/': No such file or

> >

> > directory

>

>

>

> > That’s where I’m at – Googling..

> >

> > ~ Kevin

> >

> > From: <deepsea-users-bounces at lists.suse.com> on behalf of Joel Zhou

> > <joel.zhou at suse.com>

>

> Reply-To: Discussions about the DeepSea management

>

> > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday,

> > September

> > 18, 2018 at 11:34 PM

> > To: Discussions about the DeepSea management framework for Ceph

> > <deepsea-users at lists.suse.com>

>

> Subject: Re: [Deepsea-users] stage 1 errors

>

> > on Azure

> >

> > Hi Kevin,

> >

> > My short answer is,

> >

> > Step 1, before stage 0, check your salt-api service on salt-master

> > node

> > first.

>

> ```bash

>

> > zypper install -y salt-api

> > systemctl enable salt-api.service

> > systemctl start salt-api.service

> > ```

> > Step 2, make sure NTP service works correctly on all nodes, which

> > means time

synchronized correctly on all nodes.

>

> Step 3, reboot all your nodes, if

>

> > acceptable. In case of kernel updated somehow. Step 4, then you have

> > to

> > start over again from stage 0 to 5.

> >

> > Basically, deepsea is a bunch of salt scripts, and salt based on

> > python2

> > and/or python3.

>

> I have no clues about your whole running stack, so assume

>

> > SLES 12 sp3 + SES 5, which works fine and supported. More info would

> > be

> > helpful, and also your purpose, such as for practice on your own, or

> > for

> > PoC/testing to meet customer’s demands.

> > Regards,

> >

> > --

> > Joel Zhou 周维伟

> > Senior Storage Technologist, APJ

> >

> > Mobile: +86 18514577601

> > Email: joel.zhou at suse.com

> >

> > From: <deepsea-users-bounces at lists.suse.com> on behalf of Kevin Ayres

> > <kevin.ayres at suse.com>

>

> Reply-To: Discussions about the DeepSea management

>

> > framework for Ceph <deepsea-users at lists.suse.com> Date: Tuesday,

> > September

> > 18, 2018 at 4:49 PM

> > To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>

> > Subject: [Deepsea-users] stage 1 errors on Azure

> >

> > Hey guys, I can’t seem to get past stage 1. Stage 0 complete

> > successfully.

> > Same output with deepsea command. The master and minion service are

> > running

and bidirectional host resolution are good. Keys are all

> > accepted. From what I can determine, the default files are not

> > created by stage 0 for some reason. Thoughts? What I’m seeing is that

> > it fails to create the

> > /srv/pillar/ceph/proposals

>

>

>

> > I’m running through this doc line by line:

> > https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtm

> > l/boo

k_storage_deployment/book_storage_deployment.html#deepsea.cli

>

>

>

> > ~ Kevin

> >

> >

> > salt:~ # salt-run state.orch ceph.stage.discovery

> >

> > salt-api : ["Salt API is failing to authenticate -

> > try

> > 'systemctl restart salt-master': list index out of range"]

>

>

>

> > deepsea_minions : valid

> >

> > master_minion : valid

> >

> > ceph_version : valid

> >

> > [ERROR ] No highstate or sls specified, no execution made

> >

> > salt_master:

> >

> > ----------

> >

> >

> > ID: salt-api failed

> >

> >

> >

> > Function: salt.state

> >

> >

> >

> > Name: just.exit

> >

> >

> >

> > Result: False

> >

> >

> >

> > Comment: No highstate or sls specified, no execution made

> >

> >

> >

> > Started: 22:30:53.628882

> >

> >

> >

> > Duration: 0.647 ms

> >

> >

> >

> > Changes:

> >

> >

> >

> >

> > Summary for salt_master

> >

> > ------------

> >

> > Succeeded: 0

> >

> > Failed: 1

> >

> > ------------

> >

> > Total states run: 1

> >

> > Total run time: 0.647 ms

> >

> > salt:~ # !tail

> > tail -f /var/log/salt/master

> > 2018-09-18 22:29:08,797 [salt.loaded.ext.runners.validate][WARNING

> > ][8499]

> > role-igw/cluster/igw*.sls matched no files

>

> 2018-09-18 22:29:08,797

>

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > role-openattic/cluster/salt.sls matched no files 2018-09-18

> > 22:29:08,797

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > config/stack/default/global.yml matched no files 2018-09-18

> > 22:29:08,798

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > config/stack/default/ceph/cluster.yml matched no files 2018-09-18

> > 22:29:08,798 [salt.loaded.ext.runners.validate][WARNING ][8499]

> > cluster/*.sls matched no files 2018-09-18 22:29:08,798

> > [salt.loaded.ext.runners.validate][WARNING ][8499]

> > stack/default/ceph/minions/*.yml matched no files 2018-09-18

> > 22:29:08,822

> > [salt.state ][ERROR ][8499] No highstate or sls specified, no

> > execution made 2018-09-18 22:29:52,472 [salt.transport.ipc][ERROR

> > ][5672]

Exception occurred while handling stream: [Errno 0] Success

> > 2018-09-18 22:29:56,797 [salt.state ][ERROR ][8759] No

> > highstate or sls specified, no execution made 2018-09-18 22:30:53,629

> > [salt.state ][ERROR ][9272] No highstate or sls specified, no

> > execution made

> > There’s also some issue with the salt-minion.service:

> > ● salt-minion.service - The Salt Minion

> >

> > Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;

> > enabled;

> >

> > vendor preset: disabled)

>

> Active: active (running) since Tue 2018-09-18

>

> > 22:46:54 UTC; 12s ago Main PID: 11082 (salt-minion)

> > …

> > .....

> > Sep 18 22:46:54 salt systemd[1]: Started The Salt Minion.

> > Sep 18 22:47:00 salt salt-minion[11082]: [ERROR ] Function

> > cephimages.list

in mine_functions failed to execute

>

>

>

> >

>

>

>

>

> _______________________________________________

> Deepsea-users mailing list

> Deepsea-users at lists.suse.com

> http://lists.suse.com/mailman/listinfo/deepsea-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/deepsea-users/attachments/20180919/98692cda/attachment.htm>