[caasp-beta] BTRFS space and quota]

Le Bihan Stéphane (AMUNDI-ITS) stephane.lebihan at amundi.com
Wed Nov 8 09:30:15 MST 2017


Hi,

For information, we have identify problem.

As a reminder, my architecture is one hypervisor in SLES12 (SP2) and 5 KVM with CAASP2.
Member of my team patch hypervisor from SLES 12 SP2 to SLES 12 SP3. After that we can’t connect to KVM with ssh, but on KVM network not work. We can’t ping gateway or any server….
I rollback hypervisor on snapshots before patch (Boot on snapshots on read-only, execute “snapper rollback”)
KVM restart and network works correctly.

CAASP not working because we have delete all file in /var/lib/etcd on all workers and admin nodes.
But I think problem on full FS / is a corollary to network problem after upgrade of hypervisor.

Thanks for your help.

Regards,

[cid:image001.gif at 01D35894.8DD70740]

Stéphane Le Bihan

SDE/DSI/IPR/SSD/UNX

90, Boulevard Pasteur - 75015 Paris

Web: http://www.amundi.com<http://www.amundi.com/>

Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30

@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>



De : Le Bihan Stéphane (AMUNDI-ITS)
Envoyé : vendredi 3 novembre 2017 09:49
À : 'Ludovic Cavajani'; Paul Gonin; caasp-beta at lists.suse.com
Objet : RE: [caasp-beta] BTRFS space and quota]

Hello Ludovic,

I can provide us result now, but we success to restore free space yesterday. And I think we find cause.

For restore free space we have stop etcd.service, remove all file in /var/lib/etcd, and restart etcd.service.
# systemctl stop etcd
# rm –rf /etc/sysconfig/etcd/member
# systemctl start etcd

# du -csh /*
4.6M    /bin
44M     /boot
0       /cloud-init-config
8.0K    /dev
12M     /etc
0       /home
318M    /lib
14M     /lib64
0       /mnt
0       /opt
du: cannot access '/proc/24205/task/24205/fd/4': No such file or directory
du: cannot access '/proc/24205/task/24205/fdinfo/4': No such file or directory
du: cannot access '/proc/24205/fd/4': No such file or directory
du: cannot access '/proc/24205/fdinfo/4': No such file or directory
0       /proc
3.4M    /root
218M    /run
5.7M    /sbin
0       /selinux
0       /srv
0       /sys
48K     /tmp
1.8G    /usr
5.4G    /var
7.8G    total

# btrfs fi usage /
Overall:
    Device size:                  30.00GiB
    Device allocated:              5.02GiB
    Device unallocated:           24.99GiB
    Device missing:                  0.00B
    Used:                          2.55GiB
    Free (estimated):             25.50GiB      (min: 13.00GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:               16.00MiB      (used: 0.00B)

Data,single: Size:3.00GiB, Used:2.49GiB
   /dev/vda6       3.00GiB

Metadata,DUP: Size:1.00GiB, Used:32.59MiB
   /dev/vda6       2.00GiB

System,DUP: Size:9.50MiB, Used:16.00KiB
   /dev/vda6      19.00MiB

Unallocated:
   /dev/vda6      24.99GiB


Etcd seems ok, but flannel is KO.

After search I discover we can’t ping all other server (in or not in CAASP) from master and worker.
I connect to admin node and it’s same.

So I search in history, and I found my team patch OS of hypervisor on 22-October.
My architecture is based on KVM, on one physical server SLES12 SP2, but I think after upgrade of hypervisor on SLES12 SP3, virtio card of KVM don’t work correctly…

# cat /etc/hosts
#
# hosts         This file describes a number of hostname-to-address
#               mappings for the TCP/IP subsystem.  It is mostly
#               used at boot time, when no name servers are running.
#               On small systems, this file can be used instead of a
#               "named" name server.
# Syntax:
#
# IP-Address  Full-Qualified-Hostname  Short-Hostname
#

127.0.0.1       localhost

# special IPv6 addresses
::1             localhost ipv6-localhost ipv6-loopback

fe00::0         ipv6-localnet

ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts

#-- start Salt-CaaSP managed hosts - DO NOT MODIFY --
### service names ###
127.0.0.1 api api.infra.caasp.local dev-kubm01.unix.sits.credit-agricole.fr

### admin nodes ###
10.198.47.219 admin admin.infra.caasp.local

### kubernetes masters ###
10.198.47.220 f74967034d3743f1b843d227df61c7ad f74967034d3743f1b843d227df61c7ad.infra.caasp.local

### kubernetes workers ###
10.198.47.224 82c1065b62f84a508a9e1ffeb45a5cf2 82c1065b62f84a508a9e1ffeb45a5cf2.infra.caasp.local
10.198.47.223 afbe67218e5b4807a16e84997de79c6f afbe67218e5b4807a16e84997de79c6f.infra.caasp.local
10.198.47.221 12b79838fd734263830ffeb74dbb35bb 12b79838fd734263830ffeb74dbb35bb.infra.caasp.local
10.198.47.222 d246e0d7ff5b49c0996ea10c7bb8ca43 d246e0d7ff5b49c0996ea10c7bb8ca43.infra.caasp.local
#-- end Salt-CaaSP managed hosts --

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:49:ee:13 brd ff:ff:ff:ff:ff:ff
    inet 10.198.47.220/24 brd 10.198.47.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe49:ee13/64 scope link
       valid_lft forever preferred_lft forever

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.198.47.253   0.0.0.0         UG    0      0        0 eth0
10.198.47.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0

# ping 10.198.47.219
PING 10.198.47.219 (10.198.47.219) 56(84) bytes of data.
^C
--- 10.198.47.219 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms

# ping 10.198.47.221
PING 10.198.47.221 (10.198.47.221) 56(84) bytes of data.
^C
--- 10.198.47.221 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2014ms

# ping 10.198.47.253
PING 10.198.47.253 (10.198.47.253) 56(84) bytes of data.
^C
--- 10.198.47.253 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4002ms

Regards,

[cid:image001.gif at 01D35894.8DD70740]

Stéphane Le Bihan

SDE/DSI/IPR/SSD/UNX

90, Boulevard Pasteur - 75015 Paris

Web: http://www.amundi.com<http://www.amundi.com/>

Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30

@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>



De : Ludovic Cavajani [mailto:ludovic.cavajani at suse.com]
Envoyé : jeudi 2 novembre 2017 16:47
À : Paul Gonin; caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>; Le Bihan Stéphane (AMUNDI-ITS)
Objet : Re: [caasp-beta] BTRFS space and quota]


Hello Stéphane,

Can you provide us the output of :

# du -csh /*
Regards,
On 11/02/2017 11:54 AM, Paul Gonin wrote:
-------- Message transféré --------

Date: Thu, 2 Nov 2017 10:35:13 +0000
Objet: Re: [caasp-beta] BTRFS space and quota
À: Paul Gonin <paul.gonin at suse.com<mailto:Paul%20Gonin%20%3cpaul.gonin at suse.com%3e>>, caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com> <caasp-beta at lists.suse.com<mailto:%22caasp-beta at lists.suse.com%22%20%3ccaasp-beta at lists.suse.com%3e>>
De: Le Bihan Stéphane (AMUNDI-ITS) <stephane.lebihan at amundi.com<mailto:Le%20Bihan%20%3d%3fISO-8859-1%3fQ%3fSt%3dE9phane%3f%3d%20%22%28AMUNDI-ITS%29%22%20%3cstephane.lebihan at amundi.com%3e>>
Hi Paul,

The result of command snapper ls.

# snapper ls
Type   | # | Pre # | Date                            | User | Cleanup | Description           | Userdata
-------+---+-------+---------------------------------+------+---------+-----------------------+---------
single | 0 |       |                                 | root |         | current               |
single | 1 |       | Fri 06 Oct 2017 08:47:14 AM UTC | root |         | first root filesystem |

I delete quota on /var/lb/etcd, and test balance but it’s not ok.
I recreate quota and rescan and value is same before deletion.

For information I launch du –sh on / and result is 7.8Go.

# du -sh /
du: cannot access '/proc/7982/task/7982/fd/4': No such file or directory
du: cannot access '/proc/7982/task/7982/fdinfo/4': No such file or directory
du: cannot access '/proc/7982/fd/3': No such file or directory
du: cannot access '/proc/7982/fdinfo/3': No such file or directory
7.8G    /

Regards,


[cid:image001.gif at 01D35894.8DD70740]

Stéphane Le Bihan

SDE/DSI/IPR/SSD/UNX

90, Boulevard Pasteur - 75015 Paris

Web: http://www.amundi.com<http://www.amundi.com/>

Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30

@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>



De : Paul Gonin [mailto:paul.gonin at suse.com]
Envoyé : jeudi 2 novembre 2017 10:55
À : Le Bihan Stéphane (AMUNDI-ITS); caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>
Objet : Re: [caasp-beta] BTRFS space and quota

Hi Stephane,

What is the output of
# snapper ls
?

I assume that since you there were no updates yet it should look like

Type   | # | Pre # | Date                     | User | Cleanup | Description           | Userdata
-------+---+-------+--------------------------+------+---------+-----------------------+--------------
single | 0 |       |                          | root |         | current               |
single | 1 |       | Tue Oct 31 09:07:13 2017 | root |         | first root filesystem |
single | 2 |       | Tue Oct 31 09:10:42 2017 | root | number  | after installation    | important=yes

rgds
Paul

Le mardi 31 octobre 2017 à 13:38 +0000, Le Bihan Stéphane (AMUNDI-ITS) a écrit :
Hi Paul,

We work with CaaSP2.

Regards,

[cid:image001.gif at 01D35894.8DD70740]

Stéphane Le Bihan

SDE/DSI/IPR/SSD/UNX

90, Boulevard Pasteur - 75015 Paris

Web: http://www.amundi.com<http://www.amundi.com/>

Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30

@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>



De : Paul Gonin [mailto:paul.gonin at suse.com]
Envoyé : mardi 31 octobre 2017 14:34
À : Le Bihan Stéphane (AMUNDI-ITS); caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>
Objet : Re: [caasp-beta] BTRFS space and quota

Hi Stéphane,

Not that it should make a difference for the issue described, what version of CaaSP the cluster is running ?
Is it CaaSP2 ? RC1 ?

thanks
Paul

Le mardi 31 octobre 2017 à 08:35 +0000, Le Bihan Stéphane (AMUNDI-ITS) a écrit :
Hello,

We have a strange case on CAASP plateform with btrfs quota.

For history, I was out of office since 3 weeks, but others colleague test kubernetes plateform.
When I return, we ask me because FS is full on master and worker nodes.
I don’t have cause, but I think with a bad config, subvolume /var/lib/etcd grown and  after correction reduce, though quota reserved all space.

When I check, I see btrfs usage and it’s really full, but balance as no effect.
After search I see quota is activate, and subvolumes /var/lib/etcd reserved 90% of space. But I don’t succeed to release this space.

Can you help me for release space disk ?


·         On master :



# btrfs filesystem usage /

Overall:

    Device size:                  30.00GiB

    Device allocated:             29.99GiB

    Device unallocated:           17.00MiB

    Device missing:                  0.00B

    Used:                         27.56GiB

    Free (estimated):            504.93MiB      (min: 496.43MiB)

    Data ratio:                       1.00

    Metadata ratio:                   2.00

    Global reserve:               16.00MiB      (used: 0.00B)



Data,single: Size:27.97GiB, Used:27.49GiB

   /dev/vda6      27.97GiB



Metadata,DUP: Size:1.00GiB, Used:32.64MiB

   /dev/vda6       2.00GiB



System,DUP: Size:9.50MiB, Used:16.00KiB

   /dev/vda6      19.00MiB



Unallocated:

   /dev/vda6      17.00MiB



# btrfs fi df /

Data, single: total=27.97GiB, used=27.50GiB

System, DUP: total=9.50MiB, used=16.00KiB

Metadata, DUP: total=1.00GiB, used=32.66MiB

GlobalReserve, single: total=16.00MiB, used=0.00B



# btrfs fi show /

Label: none  uuid: 1b0614eb-fc59-4841-bbc5-5318087f6432

        Total devices 1 FS bytes used 27.53GiB

        devid    1 size 30.00GiB used 29.99GiB path /dev/vda6



# btrfs subvolume list /

ID 257 gen 40 top level 5 path @

ID 258 gen 194820 top level 257 path @/.snapshots

ID 259 gen 197128 top level 258 path @/.snapshots/1/snapshot

ID 260 gen 194810 top level 257 path @/boot/grub2/i386-pc

ID 261 gen 194810 top level 257 path @/boot/grub2/x86_64-efi

ID 262 gen 194810 top level 257 path @/cloud-init-config

ID 263 gen 194810 top level 257 path @/home

ID 264 gen 197081 top level 257 path @/root

ID 265 gen 197111 top level 257 path @/tmp

ID 266 gen 194809 top level 257 path @/var/cache

ID 267 gen 194809 top level 257 path @/var/crash

ID 268 gen 195783 top level 257 path @/var/lib/ca-certificates

ID 269 gen 195783 top level 257 path @/var/lib/cloud

ID 270 gen 24 top level 257 path @/var/lib/docker

ID 271 gen 194810 top level 257 path @/var/lib/dockershim

ID 272 gen 195719 top level 257 path @/var/lib/etcd

ID 273 gen 194810 top level 257 path @/var/lib/kubelet

ID 274 gen 194810 top level 257 path @/var/lib/machines

ID 275 gen 196430 top level 257 path @/var/lib/misc

ID 276 gen 194810 top level 257 path @/var/lib/mysql

ID 277 gen 194810 top level 257 path @/var/lib/nfs

ID 278 gen 194810 top level 257 path @/var/lib/ntp

ID 279 gen 196428 top level 257 path @/var/lib/overlay

ID 280 gen 194810 top level 257 path @/var/lib/rollback

ID 281 gen 196427 top level 257 path @/var/lib/systemd

ID 282 gen 194810 top level 257 path @/var/lib/vmware

ID 283 gen 194810 top level 257 path @/var/lib/wicked

ID 284 gen 197128 top level 257 path @/var/log

ID 285 gen 197111 top level 257 path @/var/spool

ID 286 gen 196428 top level 257 path @/var/tmp



# btrfs qgroup show -pcreFf /var/lib/etcd

qgroupid         rfer         excl     max_rfer     max_excl parent  child

--------         ----         ----     --------     -------- ------  -----

0/272        25.14GiB     25.14GiB         none         none ---     ---



# du -sh /var/lib/etcd/

417M    /var/lib/etcd/



·         On one worker


# btrfs fi usage /
Overall:
    Device size:                  30.00GiB
    Device allocated:             30.00GiB
    Device unallocated:            1.00MiB
    Device missing:                  0.00B
    Used:                         27.94GiB
    Free (estimated):            135.28MiB      (min: 135.28MiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:               16.00MiB      (used: 0.00B)

Data,single: Size:27.99GiB, Used:27.86GiB
   /dev/vda6      27.99GiB

Metadata,DUP: Size:1.00GiB, Used:43.44MiB
   /dev/vda6       2.00GiB

System,DUP: Size:8.00MiB, Used:16.00KiB
   /dev/vda6      16.00MiB

Unallocated:
   /dev/vda6       1.00MiB


# btrfs fi df /

Data, single: total=27.99GiB, used=27.86GiB

System, DUP: total=8.00MiB, used=16.00KiB

Metadata, DUP: total=1.00GiB, used=43.44MiB

GlobalReserve, single: total=16.00MiB, used=0.00B



# btrfs fi show /

Label: none  uuid: 1d7b76f8-f91c-47e0-8be2-a3f02f90ac96

        Total devices 1 FS bytes used 27.90GiB

        devid    1 size 30.00GiB used 30.00GiB path /dev/vda6



# btrfs qgroup show -pcreFf /var/lib/etcd

qgroupid         rfer         excl     max_rfer     max_excl parent  child

--------         ----         ----     --------     -------- ------  -----

0/272        20.99GiB     20.99GiB         none         none ---     ---



# du -sh /var/lib/etcd/

452M    /var/lib/etcd/


Regards,

[cid:image001.gif at 01D35894.8DD70740]

Stéphane Le Bihan

SDE/DSI/IPR/SSD/UNX

90, Boulevard Pasteur - 75015 Paris

Web: http://www.amundi.com<http://www.amundi.com/>

Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30

@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>




_______________________________________________

caasp-beta mailing list

caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>

http://lists.suse.com/mailman/listinfo/caasp-beta

_______________________________________________

caasp-beta mailing list

caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>

http://lists.suse.com/mailman/listinfo/caasp-beta

_______________________________________________

caasp-beta mailing list

caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>

http://lists.suse.com/mailman/listinfo/caasp-beta

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/caasp-beta/attachments/20171108/1082d266/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 2430 bytes
Desc: image001.gif
URL: <http://lists.suse.com/pipermail/caasp-beta/attachments/20171108/1082d266/attachment.gif>


More information about the caasp-beta mailing list