[Deepsea-users] timeout disengage.safety

Robert Grosschopff Robert.Grosschopff at suse.com
Tue Aug 7 15:01:55 MDT 2018


Hi Eric,

thanks a lot. Will run the individual steps tomorrow.

My feeling is that the individual stages take too long to finish. If the timeout is one minute (I thought it is 300 seconds) then that explains it.
Doing a 'date; salt-run disengage.safety; date; salt-run disengage.check; date salt-run disengage.check' showed me the last check returns a false. No wonder the purge does not go through.

test.ping works fine. Regular pings as well. A salt-run net.ping tries to ping the cluster network IPs from the public IP which it can't reach and thus takes quite long to come back. I suppose this is how it should be. Excluding the cluster IPs makes it fast and the rtt is somewhere around 0.9 ms.

Not sure where to look for in order to determine why disengage.safety takes about 30-40 seconds causing ceph.purge to run into the timeout.

Thanks
Robert

 

-----Original Message-----
From: <deepsea-users- bounces at lists.suse.com> on behalf of Eric Jackson <ejackson at suse.com>
Reply-To: Discussions about the DeepSea management framework for Ceph <deepsea-users at lists.suse.com>
Date: Tuesday, 7. August 2018 at 21:05
To: "deepsea-users at lists.suse.com" <deepsea-users at lists.suse.com>
Subject: Re: [Deepsea-users] timeout disengage.safety

    
    Hi Robert,
    The timeout is one minute. Are all the minions responsive? As far as purging without the check, you can run the three steps in /srv/salt/ceph/purge/default.sls directly.
     
    salt 'admin*' state.apply ceph.reset
    salt -I cluster:ceph state.apply 
    ceph.rescind.storage.terminate
    salt -I cluster:ceph state.apply ceph.rescind
     
    Or copy the default.sls to another name such as mypurge.sls, remove the check and then run it
     
    salt-run state.orch ceph.purge.mypurge
     
    Eric
    
     
    On Tuesday, August 7, 2018 10:53:15 AM EDT Robert Grosschopff wrote:
    > Hi *,
    > 
    > I'd like to purge an existing cluster. Using 'salt-run disengage.safety;
    > salt-run state.orch ceph.purge'
    > 
    > Unfortunately, it takes salt-run almost 30 seconds to come back. By the time
    > the purge runs safety is already engaged again. Where can I increase the
    > time so I can get rid of the cluster without reinstalling it from scratch.
    > Where can I see why it takes so long ? On some systems with deepsea version
    > 0.8.4+git.0.a58d1c5d4 I do not get any dmidecode not found messages on
    > others (0.8.2+git.0.6b39c2648) I keep getting that (useless) error message.
    > Can I just download a current deepsea version from git and do a "make
    > install" or will it screw up the system ?
    > 
    > Thanks
    > Robert
    > 
    > _______________________________________________
    > Deepsea-users mailing list
    > Deepsea-users at lists.suse.com
    > http://lists.suse.com/mailman/listinfo/deepsea-users
     
     
    



More information about the Deepsea-users mailing list