[Deepsea-users] Deepsea fails to deploy OSDs in stage 3

Tue Jul 9 08:03:48 MDT 2019

Strahil,
	The storage node has 256GB of ram and it's a dedicated physical host. I was able to get the disks to deploy after ONLY running the "dd if=/dev/zero of=/dev/sdX bs=4k count=34 oflag=direct"

Note that I modified the block size to accommodate the 4k block size of my dsisks. After zeroing the first 34 4k sectors again, and rerunning stage 3, the OSDs deployed.

Running all of the listed steps resulted in a prepped GPT partition on the drives, which deepsea/salt complained about.

Everything is now deployed, and my cluster and dashboard are up and healthy.

Thanks for the guidance!

Allen

On 7/8/19, 1:37 PM, "Strahil Nikolov" <hunter86_bg at yahoo.com> wrote:

    Hi Allen,
    You can run : 'deepsea stage run ceph.stage.3' which will show you the exact step that fails.
    Another approach is to use something like: 'salt-run -l debug state.orch ceph.stage.3 | tee -a /somelog'

    Last time I checked the error - it seems that it had issues starting the OSD services.
    How much resources do you have on the nodes? I mean , do you use VMs with too few RAM ?

     Best Regards,
    Strahil Nikolov

    В понеделник, 8 юли 2019 г., 13:15:48 ч. Гринуич-4, Allen Sellars <asellars at vigilantnow.com> написа: 

    So I’ve run through the documented disk reset process that you attached, and rebooted the storage server. I’m still getting the same error and issue running stage 3. I adjusted the dd bs settings to accommodate the disks being 4k block size.

    Is there somewhere that may give me more verbose log info on the specific task that’s failing, or a way that I can try running that particular pillar by hand and try debugging the steps?

    I’m also open to other troubleshooting suggestions. I was able to get everything deployed on these OSDs with deepsea and ceph mimic, so I know the configuration  was at least supported by the automation at some point

    Allen

    From: Strahil <hunter86_bg at yahoo.com>Date: Saturday, July 6, 2019 at 9:16 AMTo: Allen Sellars <asellars at vigilantnow.com>Cc: deepsea-users <deepsea-users at lists.suse.com>Subject: Re: [Deepsea-users] Deepsea fails to deploy OSDs in stage 3

    Also 1 more thing - check if ceph-osd at X.service  has failed  any reasons behind that.

    Best Regards,
    Strahil Nikolov

    On Jul 6, 2019 14:24, Allen Sellars <asellars at vigilantnow.com> wrote:

    >  
    >  
    > gdisk was repeating no MVR and no GPT partitions, so I assumed they were safe to use. 
    > 
    >  
    >   
    > 
    > 
    >  
    > I’ll go through zeroing them out with this process and report back.
    > 
    > 
    >  
    >   
    > 
    > 
    >  
    > Thanks
    > 
    >  
    >  
    > Allen Sellars
    > 
    > 
    >  
    > asellars at vigilantnow.com
    > 
    > 
    >  
    >   
    > 
    > 
    > -Sent from my iPhone
    > 
    > 
    >  
    > 
    > On Jul 5, 2019, at 18:04, Strahil <hunter86_bg at yahoo.com> wrote:
    > 
    > 
    >>  
    >>  
    >> Hi Allen,
    >> 
    >> I think that you need empty disks for deepsea to 'target' them.
    >> 
    >> Can you wipe the partition's beginning, disk beginning and disk end ?
    >> 
    >> Should be something like:
    >> 
    >> for partition in /dev/sdX[0-9]*
    >> do  
    >> dd if=/dev/zero of=$partition bs=4096 count=1 oflag=direct done  
    >> 
    >> dd if=/dev/zero of=/dev/sdX bs=512 count=34 oflag=direct
    >> 
    >> dd if=/dev/zero of=/dev/sdX bs=512 count=33 \ 
    >> seek=$((`blockdev --getsz /dev/sdX` - 33)) oflag=direct
    >> 
    >> And then create a gpt partition table:
    >> 
    >> sgdisk -Z --clear -g /dev/sdX
    >> 
    >> Source:  https://www.google.bg/url?sa=t&source=web&rct=j&url=https://www.suse.com/documentation/suse-enterprise-storage-5/pdfdoc/book_storage_deployment/book_storage_deployment.pdf&ved=2ahUKEwj_2ouC4p7jAhWkwqYKHd7OBJUQFjAAegQIARAB&usg=AOvVaw3g9_lOOBwwzqK3siEkNbnF
    >> 
    >> Best Regards,
    >> Strahil Nikolov
    >> 
    >>  
    >> On Jul 6, 2019 00:41, Allen Sellars <asellars at vigilantnow.com> wrote:
    >> 
    >>>  
    >>>  
    >>>  
    >>> I have a cisco UCS S3260 with 52 6TB spinning disks and 4 SSDs as DB disks.
    >>> 
    >>>  
    >>> 
    >>> I have no profile-* configs in the proposals directory.
    >>> 
    >>>  
    >>> 
    >>> I’ve obscured FQDNs
    >>> 
    >>>  
    >>> 
    >>> Stages 0-2 run fine with no failures. I see the following in stage 3:
    >>> 
    >>> When I run salt-run state.orch ceph.stage.3 my salt-master return this:
    >>> 
    >>>  
    >>> 
    >>> firewall                 : disabled
    >>> 
    >>> apparmor                 : disabled
    >>> 
    >>> subvolume                : skipping
    >>> 
    >>> DEV_ENV                  : True
    >>> 
    >>> 
    >>> 
    >> 
    >> 
    >> 
    > 
    > 
    >