[caasp-beta] BTRFS space and quota]
Le Bihan Stéphane (AMUNDI-ITS)
stephane.lebihan at amundi.com
Wed Nov 8 09:30:15 MST 2017
Hi,
For information, we have identify problem.
As a reminder, my architecture is one hypervisor in SLES12 (SP2) and 5 KVM with CAASP2.
Member of my team patch hypervisor from SLES 12 SP2 to SLES 12 SP3. After that we can’t connect to KVM with ssh, but on KVM network not work. We can’t ping gateway or any server….
I rollback hypervisor on snapshots before patch (Boot on snapshots on read-only, execute “snapper rollback”)
KVM restart and network works correctly.
CAASP not working because we have delete all file in /var/lib/etcd on all workers and admin nodes.
But I think problem on full FS / is a corollary to network problem after upgrade of hypervisor.
Thanks for your help.
Regards,
[cid:image001.gif at 01D35894.8DD70740]
Stéphane Le Bihan
SDE/DSI/IPR/SSD/UNX
90, Boulevard Pasteur - 75015 Paris
Web: http://www.amundi.com<http://www.amundi.com/>
Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30
@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>
De : Le Bihan Stéphane (AMUNDI-ITS)
Envoyé : vendredi 3 novembre 2017 09:49
À : 'Ludovic Cavajani'; Paul Gonin; caasp-beta at lists.suse.com
Objet : RE: [caasp-beta] BTRFS space and quota]
Hello Ludovic,
I can provide us result now, but we success to restore free space yesterday. And I think we find cause.
For restore free space we have stop etcd.service, remove all file in /var/lib/etcd, and restart etcd.service.
# systemctl stop etcd
# rm –rf /etc/sysconfig/etcd/member
# systemctl start etcd
# du -csh /*
4.6M /bin
44M /boot
0 /cloud-init-config
8.0K /dev
12M /etc
0 /home
318M /lib
14M /lib64
0 /mnt
0 /opt
du: cannot access '/proc/24205/task/24205/fd/4': No such file or directory
du: cannot access '/proc/24205/task/24205/fdinfo/4': No such file or directory
du: cannot access '/proc/24205/fd/4': No such file or directory
du: cannot access '/proc/24205/fdinfo/4': No such file or directory
0 /proc
3.4M /root
218M /run
5.7M /sbin
0 /selinux
0 /srv
0 /sys
48K /tmp
1.8G /usr
5.4G /var
7.8G total
# btrfs fi usage /
Overall:
Device size: 30.00GiB
Device allocated: 5.02GiB
Device unallocated: 24.99GiB
Device missing: 0.00B
Used: 2.55GiB
Free (estimated): 25.50GiB (min: 13.00GiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 16.00MiB (used: 0.00B)
Data,single: Size:3.00GiB, Used:2.49GiB
/dev/vda6 3.00GiB
Metadata,DUP: Size:1.00GiB, Used:32.59MiB
/dev/vda6 2.00GiB
System,DUP: Size:9.50MiB, Used:16.00KiB
/dev/vda6 19.00MiB
Unallocated:
/dev/vda6 24.99GiB
Etcd seems ok, but flannel is KO.
After search I discover we can’t ping all other server (in or not in CAASP) from master and worker.
I connect to admin node and it’s same.
So I search in history, and I found my team patch OS of hypervisor on 22-October.
My architecture is based on KVM, on one physical server SLES12 SP2, but I think after upgrade of hypervisor on SLES12 SP3, virtio card of KVM don’t work correctly…
# cat /etc/hosts
#
# hosts This file describes a number of hostname-to-address
# mappings for the TCP/IP subsystem. It is mostly
# used at boot time, when no name servers are running.
# On small systems, this file can be used instead of a
# "named" name server.
# Syntax:
#
# IP-Address Full-Qualified-Hostname Short-Hostname
#
127.0.0.1 localhost
# special IPv6 addresses
::1 localhost ipv6-localhost ipv6-loopback
fe00::0 ipv6-localnet
ff00::0 ipv6-mcastprefix
ff02::1 ipv6-allnodes
ff02::2 ipv6-allrouters
ff02::3 ipv6-allhosts
#-- start Salt-CaaSP managed hosts - DO NOT MODIFY --
### service names ###
127.0.0.1 api api.infra.caasp.local dev-kubm01.unix.sits.credit-agricole.fr
### admin nodes ###
10.198.47.219 admin admin.infra.caasp.local
### kubernetes masters ###
10.198.47.220 f74967034d3743f1b843d227df61c7ad f74967034d3743f1b843d227df61c7ad.infra.caasp.local
### kubernetes workers ###
10.198.47.224 82c1065b62f84a508a9e1ffeb45a5cf2 82c1065b62f84a508a9e1ffeb45a5cf2.infra.caasp.local
10.198.47.223 afbe67218e5b4807a16e84997de79c6f afbe67218e5b4807a16e84997de79c6f.infra.caasp.local
10.198.47.221 12b79838fd734263830ffeb74dbb35bb 12b79838fd734263830ffeb74dbb35bb.infra.caasp.local
10.198.47.222 d246e0d7ff5b49c0996ea10c7bb8ca43 d246e0d7ff5b49c0996ea10c7bb8ca43.infra.caasp.local
#-- end Salt-CaaSP managed hosts --
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:49:ee:13 brd ff:ff:ff:ff:ff:ff
inet 10.198.47.220/24 brd 10.198.47.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe49:ee13/64 scope link
valid_lft forever preferred_lft forever
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.198.47.253 0.0.0.0 UG 0 0 0 eth0
10.198.47.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
# ping 10.198.47.219
PING 10.198.47.219 (10.198.47.219) 56(84) bytes of data.
^C
--- 10.198.47.219 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms
# ping 10.198.47.221
PING 10.198.47.221 (10.198.47.221) 56(84) bytes of data.
^C
--- 10.198.47.221 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2014ms
# ping 10.198.47.253
PING 10.198.47.253 (10.198.47.253) 56(84) bytes of data.
^C
--- 10.198.47.253 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4002ms
Regards,
[cid:image001.gif at 01D35894.8DD70740]
Stéphane Le Bihan
SDE/DSI/IPR/SSD/UNX
90, Boulevard Pasteur - 75015 Paris
Web: http://www.amundi.com<http://www.amundi.com/>
Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30
@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>
De : Ludovic Cavajani [mailto:ludovic.cavajani at suse.com]
Envoyé : jeudi 2 novembre 2017 16:47
À : Paul Gonin; caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>; Le Bihan Stéphane (AMUNDI-ITS)
Objet : Re: [caasp-beta] BTRFS space and quota]
Hello Stéphane,
Can you provide us the output of :
# du -csh /*
Regards,
On 11/02/2017 11:54 AM, Paul Gonin wrote:
-------- Message transféré --------
Date: Thu, 2 Nov 2017 10:35:13 +0000
Objet: Re: [caasp-beta] BTRFS space and quota
À: Paul Gonin <paul.gonin at suse.com<mailto:Paul%20Gonin%20%3cpaul.gonin at suse.com%3e>>, caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com> <caasp-beta at lists.suse.com<mailto:%22caasp-beta at lists.suse.com%22%20%3ccaasp-beta at lists.suse.com%3e>>
De: Le Bihan Stéphane (AMUNDI-ITS) <stephane.lebihan at amundi.com<mailto:Le%20Bihan%20%3d%3fISO-8859-1%3fQ%3fSt%3dE9phane%3f%3d%20%22%28AMUNDI-ITS%29%22%20%3cstephane.lebihan at amundi.com%3e>>
Hi Paul,
The result of command snapper ls.
# snapper ls
Type | # | Pre # | Date | User | Cleanup | Description | Userdata
-------+---+-------+---------------------------------+------+---------+-----------------------+---------
single | 0 | | | root | | current |
single | 1 | | Fri 06 Oct 2017 08:47:14 AM UTC | root | | first root filesystem |
I delete quota on /var/lb/etcd, and test balance but it’s not ok.
I recreate quota and rescan and value is same before deletion.
For information I launch du –sh on / and result is 7.8Go.
# du -sh /
du: cannot access '/proc/7982/task/7982/fd/4': No such file or directory
du: cannot access '/proc/7982/task/7982/fdinfo/4': No such file or directory
du: cannot access '/proc/7982/fd/3': No such file or directory
du: cannot access '/proc/7982/fdinfo/3': No such file or directory
7.8G /
Regards,
[cid:image001.gif at 01D35894.8DD70740]
Stéphane Le Bihan
SDE/DSI/IPR/SSD/UNX
90, Boulevard Pasteur - 75015 Paris
Web: http://www.amundi.com<http://www.amundi.com/>
Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30
@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>
De : Paul Gonin [mailto:paul.gonin at suse.com]
Envoyé : jeudi 2 novembre 2017 10:55
À : Le Bihan Stéphane (AMUNDI-ITS); caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>
Objet : Re: [caasp-beta] BTRFS space and quota
Hi Stephane,
What is the output of
# snapper ls
?
I assume that since you there were no updates yet it should look like
Type | # | Pre # | Date | User | Cleanup | Description | Userdata
-------+---+-------+--------------------------+------+---------+-----------------------+--------------
single | 0 | | | root | | current |
single | 1 | | Tue Oct 31 09:07:13 2017 | root | | first root filesystem |
single | 2 | | Tue Oct 31 09:10:42 2017 | root | number | after installation | important=yes
rgds
Paul
Le mardi 31 octobre 2017 à 13:38 +0000, Le Bihan Stéphane (AMUNDI-ITS) a écrit :
Hi Paul,
We work with CaaSP2.
Regards,
[cid:image001.gif at 01D35894.8DD70740]
Stéphane Le Bihan
SDE/DSI/IPR/SSD/UNX
90, Boulevard Pasteur - 75015 Paris
Web: http://www.amundi.com<http://www.amundi.com/>
Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30
@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>
De : Paul Gonin [mailto:paul.gonin at suse.com]
Envoyé : mardi 31 octobre 2017 14:34
À : Le Bihan Stéphane (AMUNDI-ITS); caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>
Objet : Re: [caasp-beta] BTRFS space and quota
Hi Stéphane,
Not that it should make a difference for the issue described, what version of CaaSP the cluster is running ?
Is it CaaSP2 ? RC1 ?
thanks
Paul
Le mardi 31 octobre 2017 à 08:35 +0000, Le Bihan Stéphane (AMUNDI-ITS) a écrit :
Hello,
We have a strange case on CAASP plateform with btrfs quota.
For history, I was out of office since 3 weeks, but others colleague test kubernetes plateform.
When I return, we ask me because FS is full on master and worker nodes.
I don’t have cause, but I think with a bad config, subvolume /var/lib/etcd grown and after correction reduce, though quota reserved all space.
When I check, I see btrfs usage and it’s really full, but balance as no effect.
After search I see quota is activate, and subvolumes /var/lib/etcd reserved 90% of space. But I don’t succeed to release this space.
Can you help me for release space disk ?
· On master :
# btrfs filesystem usage /
Overall:
Device size: 30.00GiB
Device allocated: 29.99GiB
Device unallocated: 17.00MiB
Device missing: 0.00B
Used: 27.56GiB
Free (estimated): 504.93MiB (min: 496.43MiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 16.00MiB (used: 0.00B)
Data,single: Size:27.97GiB, Used:27.49GiB
/dev/vda6 27.97GiB
Metadata,DUP: Size:1.00GiB, Used:32.64MiB
/dev/vda6 2.00GiB
System,DUP: Size:9.50MiB, Used:16.00KiB
/dev/vda6 19.00MiB
Unallocated:
/dev/vda6 17.00MiB
# btrfs fi df /
Data, single: total=27.97GiB, used=27.50GiB
System, DUP: total=9.50MiB, used=16.00KiB
Metadata, DUP: total=1.00GiB, used=32.66MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
# btrfs fi show /
Label: none uuid: 1b0614eb-fc59-4841-bbc5-5318087f6432
Total devices 1 FS bytes used 27.53GiB
devid 1 size 30.00GiB used 29.99GiB path /dev/vda6
# btrfs subvolume list /
ID 257 gen 40 top level 5 path @
ID 258 gen 194820 top level 257 path @/.snapshots
ID 259 gen 197128 top level 258 path @/.snapshots/1/snapshot
ID 260 gen 194810 top level 257 path @/boot/grub2/i386-pc
ID 261 gen 194810 top level 257 path @/boot/grub2/x86_64-efi
ID 262 gen 194810 top level 257 path @/cloud-init-config
ID 263 gen 194810 top level 257 path @/home
ID 264 gen 197081 top level 257 path @/root
ID 265 gen 197111 top level 257 path @/tmp
ID 266 gen 194809 top level 257 path @/var/cache
ID 267 gen 194809 top level 257 path @/var/crash
ID 268 gen 195783 top level 257 path @/var/lib/ca-certificates
ID 269 gen 195783 top level 257 path @/var/lib/cloud
ID 270 gen 24 top level 257 path @/var/lib/docker
ID 271 gen 194810 top level 257 path @/var/lib/dockershim
ID 272 gen 195719 top level 257 path @/var/lib/etcd
ID 273 gen 194810 top level 257 path @/var/lib/kubelet
ID 274 gen 194810 top level 257 path @/var/lib/machines
ID 275 gen 196430 top level 257 path @/var/lib/misc
ID 276 gen 194810 top level 257 path @/var/lib/mysql
ID 277 gen 194810 top level 257 path @/var/lib/nfs
ID 278 gen 194810 top level 257 path @/var/lib/ntp
ID 279 gen 196428 top level 257 path @/var/lib/overlay
ID 280 gen 194810 top level 257 path @/var/lib/rollback
ID 281 gen 196427 top level 257 path @/var/lib/systemd
ID 282 gen 194810 top level 257 path @/var/lib/vmware
ID 283 gen 194810 top level 257 path @/var/lib/wicked
ID 284 gen 197128 top level 257 path @/var/log
ID 285 gen 197111 top level 257 path @/var/spool
ID 286 gen 196428 top level 257 path @/var/tmp
# btrfs qgroup show -pcreFf /var/lib/etcd
qgroupid rfer excl max_rfer max_excl parent child
-------- ---- ---- -------- -------- ------ -----
0/272 25.14GiB 25.14GiB none none --- ---
# du -sh /var/lib/etcd/
417M /var/lib/etcd/
· On one worker
# btrfs fi usage /
Overall:
Device size: 30.00GiB
Device allocated: 30.00GiB
Device unallocated: 1.00MiB
Device missing: 0.00B
Used: 27.94GiB
Free (estimated): 135.28MiB (min: 135.28MiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 16.00MiB (used: 0.00B)
Data,single: Size:27.99GiB, Used:27.86GiB
/dev/vda6 27.99GiB
Metadata,DUP: Size:1.00GiB, Used:43.44MiB
/dev/vda6 2.00GiB
System,DUP: Size:8.00MiB, Used:16.00KiB
/dev/vda6 16.00MiB
Unallocated:
/dev/vda6 1.00MiB
# btrfs fi df /
Data, single: total=27.99GiB, used=27.86GiB
System, DUP: total=8.00MiB, used=16.00KiB
Metadata, DUP: total=1.00GiB, used=43.44MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
# btrfs fi show /
Label: none uuid: 1d7b76f8-f91c-47e0-8be2-a3f02f90ac96
Total devices 1 FS bytes used 27.90GiB
devid 1 size 30.00GiB used 30.00GiB path /dev/vda6
# btrfs qgroup show -pcreFf /var/lib/etcd
qgroupid rfer excl max_rfer max_excl parent child
-------- ---- ---- -------- -------- ------ -----
0/272 20.99GiB 20.99GiB none none --- ---
# du -sh /var/lib/etcd/
452M /var/lib/etcd/
Regards,
[cid:image001.gif at 01D35894.8DD70740]
Stéphane Le Bihan
SDE/DSI/IPR/SSD/UNX
90, Boulevard Pasteur - 75015 Paris
Web: http://www.amundi.com<http://www.amundi.com/>
Tél: +33 1 76 32 32 08
Equipe Unix : +33 1 76 32 02 30
@: stephane.lebihan at amundi.com<mailto:stephane.lebihan at amundi.com>
@ : sits.unix at amundi.com<mailto:sits.unix at amundi.com>
_______________________________________________
caasp-beta mailing list
caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>
http://lists.suse.com/mailman/listinfo/caasp-beta
_______________________________________________
caasp-beta mailing list
caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>
http://lists.suse.com/mailman/listinfo/caasp-beta
_______________________________________________
caasp-beta mailing list
caasp-beta at lists.suse.com<mailto:caasp-beta at lists.suse.com>
http://lists.suse.com/mailman/listinfo/caasp-beta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.suse.com/pipermail/caasp-beta/attachments/20171108/1082d266/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 2430 bytes
Desc: image001.gif
URL: <http://lists.suse.com/pipermail/caasp-beta/attachments/20171108/1082d266/attachment.gif>
More information about the caasp-beta
mailing list