From ananda at fieldday.io Wed Nov 1 23:17:37 2017 From: ananda at fieldday.io (Ananda Kammampati) Date: Wed, 1 Nov 2017 22:17:37 -0700 Subject: [caasp-beta] Need help with Caas-2.0 (RC) installation In-Reply-To: <0EF6F053-74FF-4423-90FE-4F1D7EFC8873@suse.com> References: <0EF6F053-74FF-4423-90FE-4F1D7EFC8873@suse.com> Message-ID: Hi, I am looking for some help/guidance as what I am missing with CaaS-2.0 (RC1) platform. I have tried every combination (static, dhcp, autoyast) and I am seeing the same behavior. IP addresses for my setup are as follows: Admin node: 172.16.10.50 (admin.susecaas.local) Master node: 172.16.10.101 (master-01.susecaas.local) Worker node-01 : 172.16.10.201 (node-01.susecaas.local) Worker node-02: 172.16.10.202 (node-02.susecaas.local) jumpbox : 172.16.10.10 (this is the bastion host from where I will be accessing CaaS platform) From Admin dashboard, I am able to see both master and worker nodes registered But when I access master node that is external facing with the URL: https://master-01.susecaas.local:6443 ?This is all I am getting: From another machine (jumpbox) that is on the same network, I downloaded the kubectl config file. I then placed that kubectl config file inside the $HOME/.kube directory Now when I try the kubectl command, this is what I am getting: Appreciate if anyone can help/advice me as what I am missing to get the installation right. thanks in advance, Ananda Kammampati -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jnkkmoejajnfhlme.png Type: image/png Size: 178081 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nlnoplelhmbjhpii.png Type: image/png Size: 42307 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hdgiagpiggkoefoo.png Type: image/png Size: 952767 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: adodgclagmakjife.png Type: image/png Size: 248118 bytes Desc: not available URL: From paul.gonin at suse.com Thu Nov 2 03:55:24 2017 From: paul.gonin at suse.com (Paul Gonin) Date: Thu, 02 Nov 2017 10:55:24 +0100 Subject: [caasp-beta] BTRFS space and quota In-Reply-To: References: <1509456813.6297.45.camel@suse.com> Message-ID: <1509616524.25056.2.camel@suse.com> Hi Stephane, What is the output of # snapper ls? I assume that since you there were no updates yet it should look like Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+------------ --------------+------+---------+-----------------------+------------ --single | 0 | | | root | | current | single | 1 | | Tue Oct 31 09:07:13 2017 | root | | first root filesystem | single | 2 | | Tue Oct 31 09:10:42 2017 | root | number | after installation | important=yes rgdsPaul Le mardi 31 octobre 2017 ? 13:38 +0000, Le Bihan St?phane (AMUNDI-ITS) a ?crit : > Hi Paul, > > We work with CaaSP2. > > Regards, > > > > > > > > > > > > St?phane Le Bihan > > > > > SDE/DSI/IPR/SSD/UNX > > > > > 90, Boulevard Pasteur - 75015 Paris > > > > > Web: > http://www.amundi.com > > > > > T?l: +33 1 76 32 32 08 > Equipe Unix : +33 1 76 32 02 30 > > > > > @: > stephane.lebihan at amundi.com > @ : > sits.unix at amundi.com > > > > > > > > > > De : Paul Gonin [mailto:paul.gonin at suse.com] > > > Envoy? : mardi 31 octobre 2017 14:34 > > ? : Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com > > Objet : Re: [caasp-beta] BTRFS space and quota > > > > > Hi St?phane, > > > > > > Not that it should make a difference for the issue described, what > version of CaaSP the cluster is running ? > > > Is it CaaSP2 ? RC1 ? > > > > > > thanks > > > Paul > > > > > > Le mardi 31 octobre 2017 ? 08:35 +0000, Le Bihan St?phane (AMUNDI- > ITS) a ?crit : > > > Hello, > > > > We have a strange case on CAASP plateform with btrfs quota. > > > > For history, I was out of office since 3 weeks, but others > > colleague test kubernetes plateform. > > When I return, we ask me because FS is full on master and worker > > nodes. > > I don?t have cause, but I think with a bad config, subvolume > > /var/lib/etcd grown and after correction reduce, though quota > > reserved all space. > > > > When I check, I see btrfs usage and it?s really full, but balance > > as no effect. > > After search I see quota is activate, and subvolumes /var/lib/etcd > > reserved 90% of space. But I don?t succeed to release this space. > > > > Can you help me for release space disk ? > > > > ? > > On master : > > > > # btrfs filesystem usage / > > Overall: > > Device size: 30.00GiB > > Device allocated: 29.99GiB > > Device unallocated: 17.00MiB > > Device missing: 0.00B > > Used: 27.56GiB > > Free (estimated): 504.93MiB (min: 496.43MiB) > > Data ratio: 1.00 > > Metadata ratio: 2.00 > > Global reserve: 16.00MiB (used: 0.00B) > > > > Data,single: Size:27.97GiB, Used:27.49GiB > > /dev/vda6 27.97GiB > > > > Metadata,DUP: Size:1.00GiB, Used:32.64MiB > > /dev/vda6 2.00GiB > > > > System,DUP: Size:9.50MiB, Used:16.00KiB > > /dev/vda6 19.00MiB > > > > Unallocated: > > /dev/vda6 17.00MiB > > > > # btrfs fi df / > > Data, single: total=27.97GiB, used=27.50GiB > > System, DUP: total=9.50MiB, used=16.00KiB > > Metadata, DUP: total=1.00GiB, used=32.66MiB > > GlobalReserve, single: total=16.00MiB, used=0.00B > > > > # btrfs fi show / > > Label: none uuid: 1b0614eb-fc59-4841-bbc5-5318087f6432 > > Total devices 1 FS bytes used 27.53GiB > > devid 1 size 30.00GiB used 29.99GiB path /dev/vda6 > > > > # btrfs subvolume list / > > ID 257 gen 40 top level 5 path @ > > ID 258 gen 194820 top level 257 path @/.snapshots > > ID 259 gen 197128 top level 258 path @/.snapshots/1/snapshot > > ID 260 gen 194810 top level 257 path @/boot/grub2/i386-pc > > ID 261 gen 194810 top level 257 path @/boot/grub2/x86_64-efi > > ID 262 gen 194810 top level 257 path @/cloud-init-config > > ID 263 gen 194810 top level 257 path @/home > > ID 264 gen 197081 top level 257 path @/root > > ID 265 gen 197111 top level 257 path @/tmp > > ID 266 gen 194809 top level 257 path @/var/cache > > ID 267 gen 194809 top level 257 path @/var/crash > > ID 268 gen 195783 top level 257 path @/var/lib/ca-certificates > > ID 269 gen 195783 top level 257 path @/var/lib/cloud > > ID 270 gen 24 top level 257 path @/var/lib/docker > > ID 271 gen 194810 top level 257 path @/var/lib/dockershim > > ID 272 gen 195719 top level 257 path @/var/lib/etcd > > ID 273 gen 194810 top level 257 path @/var/lib/kubelet > > ID 274 gen 194810 top level 257 path @/var/lib/machines > > ID 275 gen 196430 top level 257 path @/var/lib/misc > > ID 276 gen 194810 top level 257 path @/var/lib/mysql > > ID 277 gen 194810 top level 257 path @/var/lib/nfs > > ID 278 gen 194810 top level 257 path @/var/lib/ntp > > ID 279 gen 196428 top level 257 path @/var/lib/overlay > > ID 280 gen 194810 top level 257 path @/var/lib/rollback > > ID 281 gen 196427 top level 257 path @/var/lib/systemd > > ID 282 gen 194810 top level 257 path @/var/lib/vmware > > ID 283 gen 194810 top level 257 path @/var/lib/wicked > > ID 284 gen 197128 top level 257 path @/var/log > > ID 285 gen 197111 top level 257 path @/var/spool > > ID 286 gen 196428 top level 257 path @/var/tmp > > > > # btrfs qgroup show -pcreFf /var/lib/etcd > > qgroupid rfer excl max_rfer max_excl > > parent child > > -------- ---- ---- -------- -------- --- > > --- ----- > > 0/272 25.14GiB 25.14GiB none none - > > -- --- > > > > # du -sh /var/lib/etcd/ > > 417M /var/lib/etcd/ > > > > ? > > On one worker > > > > # btrfs fi usage / > > Overall: > > Device size: 30.00GiB > > Device allocated: 30.00GiB > > Device unallocated: 1.00MiB > > Device missing: 0.00B > > Used: 27.94GiB > > Free (estimated): 135.28MiB (min: 135.28MiB) > > Data ratio: 1.00 > > Metadata ratio: 2.00 > > Global reserve: 16.00MiB (used: 0.00B) > > > > Data,single: Size:27.99GiB, Used:27.86GiB > > /dev/vda6 27.99GiB > > > > Metadata,DUP: Size:1.00GiB, Used:43.44MiB > > /dev/vda6 2.00GiB > > > > System,DUP: Size:8.00MiB, Used:16.00KiB > > /dev/vda6 16.00MiB > > > > Unallocated: > > /dev/vda6 1.00MiB > > > > # btrfs fi df / > > Data, single: total=27.99GiB, used=27.86GiB > > System, DUP: total=8.00MiB, used=16.00KiB > > Metadata, DUP: total=1.00GiB, used=43.44MiB > > GlobalReserve, single: total=16.00MiB, used=0.00B > > > > # btrfs fi show / > > Label: none uuid: 1d7b76f8-f91c-47e0-8be2-a3f02f90ac96 > > Total devices 1 FS bytes used 27.90GiB > > devid 1 size 30.00GiB used 30.00GiB path /dev/vda6 > > > > # btrfs qgroup show -pcreFf /var/lib/etcd > > qgroupid rfer excl max_rfer max_excl > > parent child > > -------- ---- ---- -------- -------- --- > > --- ----- > > 0/272 20.99GiB 20.99GiB none none - > > -- --- > > > > # du -sh /var/lib/etcd/ > > 452M /var/lib/etcd/ > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > St?phane Le Bihan > > > > > > > > > > SDE/DSI/IPR/SSD/UNX > > > > > > > > > > 90, Boulevard Pasteur - 75015 Paris > > > > > > > > > > Web: > > http://www.amundi.com > > > > > > > > > > T?l: +33 1 76 32 32 08 > > Equipe Unix : +33 1 76 32 02 30 > > > > > > > > > > @: > > stephane.lebihan at amundi.com > > @ : > > sits.unix at amundi.com > > > > > > > > > > > > > > _______________________________________________ > > caasp-beta mailing list > > caasp-beta at lists.suse.com > > http://lists.suse.com/mailman/listinfo/caasp-beta > > > > > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2430 bytes Desc: not available URL: From stephane.lebihan at amundi.com Thu Nov 2 04:35:13 2017 From: stephane.lebihan at amundi.com (=?utf-8?B?TGUgQmloYW4gU3TDqXBoYW5lIChBTVVOREktSVRTKQ==?=) Date: Thu, 2 Nov 2017 10:35:13 +0000 Subject: [caasp-beta] BTRFS space and quota In-Reply-To: <1509616524.25056.2.camel@suse.com> References: <1509456813.6297.45.camel@suse.com> <1509616524.25056.2.camel@suse.com> Message-ID: Hi Paul, The result of command snapper ls. # snapper ls Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+---------------------------------+------+---------+-----------------------+--------- single | 0 | | | root | | current | single | 1 | | Fri 06 Oct 2017 08:47:14 AM UTC | root | | first root filesystem | I delete quota on /var/lb/etcd, and test balance but it?s not ok. I recreate quota and rescan and value is same before deletion. For information I launch du ?sh on / and result is 7.8Go. # du -sh / du: cannot access '/proc/7982/task/7982/fd/4': No such file or directory du: cannot access '/proc/7982/task/7982/fdinfo/4': No such file or directory du: cannot access '/proc/7982/fd/3': No such file or directory du: cannot access '/proc/7982/fdinfo/3': No such file or directory 7.8G / Regards, [cid:image001.gif at 01D353CE.6824B1C0] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Paul Gonin [mailto:paul.gonin at suse.com] Envoy? : jeudi 2 novembre 2017 10:55 ? : Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com Objet : Re: [caasp-beta] BTRFS space and quota Hi Stephane, What is the output of # snapper ls ? I assume that since you there were no updates yet it should look like Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+--------------------------+------+---------+-----------------------+-------------- single | 0 | | | root | | current | single | 1 | | Tue Oct 31 09:07:13 2017 | root | | first root filesystem | single | 2 | | Tue Oct 31 09:10:42 2017 | root | number | after installation | important=yes rgds Paul Le mardi 31 octobre 2017 ? 13:38 +0000, Le Bihan St?phane (AMUNDI-ITS) a ?crit : Hi Paul, We work with CaaSP2. Regards, [cid:image001.gif at 01D353CE.6824B1C0] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Paul Gonin [mailto:paul.gonin at suse.com] Envoy? : mardi 31 octobre 2017 14:34 ? : Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com Objet : Re: [caasp-beta] BTRFS space and quota Hi St?phane, Not that it should make a difference for the issue described, what version of CaaSP the cluster is running ? Is it CaaSP2 ? RC1 ? thanks Paul Le mardi 31 octobre 2017 ? 08:35 +0000, Le Bihan St?phane (AMUNDI-ITS) a ?crit : Hello, We have a strange case on CAASP plateform with btrfs quota. For history, I was out of office since 3 weeks, but others colleague test kubernetes plateform. When I return, we ask me because FS is full on master and worker nodes. I don?t have cause, but I think with a bad config, subvolume /var/lib/etcd grown and after correction reduce, though quota reserved all space. When I check, I see btrfs usage and it?s really full, but balance as no effect. After search I see quota is activate, and subvolumes /var/lib/etcd reserved 90% of space. But I don?t succeed to release this space. Can you help me for release space disk ? ? On master : # btrfs filesystem usage / Overall: Device size: 30.00GiB Device allocated: 29.99GiB Device unallocated: 17.00MiB Device missing: 0.00B Used: 27.56GiB Free (estimated): 504.93MiB (min: 496.43MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:27.97GiB, Used:27.49GiB /dev/vda6 27.97GiB Metadata,DUP: Size:1.00GiB, Used:32.64MiB /dev/vda6 2.00GiB System,DUP: Size:9.50MiB, Used:16.00KiB /dev/vda6 19.00MiB Unallocated: /dev/vda6 17.00MiB # btrfs fi df / Data, single: total=27.97GiB, used=27.50GiB System, DUP: total=9.50MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=32.66MiB GlobalReserve, single: total=16.00MiB, used=0.00B # btrfs fi show / Label: none uuid: 1b0614eb-fc59-4841-bbc5-5318087f6432 Total devices 1 FS bytes used 27.53GiB devid 1 size 30.00GiB used 29.99GiB path /dev/vda6 # btrfs subvolume list / ID 257 gen 40 top level 5 path @ ID 258 gen 194820 top level 257 path @/.snapshots ID 259 gen 197128 top level 258 path @/.snapshots/1/snapshot ID 260 gen 194810 top level 257 path @/boot/grub2/i386-pc ID 261 gen 194810 top level 257 path @/boot/grub2/x86_64-efi ID 262 gen 194810 top level 257 path @/cloud-init-config ID 263 gen 194810 top level 257 path @/home ID 264 gen 197081 top level 257 path @/root ID 265 gen 197111 top level 257 path @/tmp ID 266 gen 194809 top level 257 path @/var/cache ID 267 gen 194809 top level 257 path @/var/crash ID 268 gen 195783 top level 257 path @/var/lib/ca-certificates ID 269 gen 195783 top level 257 path @/var/lib/cloud ID 270 gen 24 top level 257 path @/var/lib/docker ID 271 gen 194810 top level 257 path @/var/lib/dockershim ID 272 gen 195719 top level 257 path @/var/lib/etcd ID 273 gen 194810 top level 257 path @/var/lib/kubelet ID 274 gen 194810 top level 257 path @/var/lib/machines ID 275 gen 196430 top level 257 path @/var/lib/misc ID 276 gen 194810 top level 257 path @/var/lib/mysql ID 277 gen 194810 top level 257 path @/var/lib/nfs ID 278 gen 194810 top level 257 path @/var/lib/ntp ID 279 gen 196428 top level 257 path @/var/lib/overlay ID 280 gen 194810 top level 257 path @/var/lib/rollback ID 281 gen 196427 top level 257 path @/var/lib/systemd ID 282 gen 194810 top level 257 path @/var/lib/vmware ID 283 gen 194810 top level 257 path @/var/lib/wicked ID 284 gen 197128 top level 257 path @/var/log ID 285 gen 197111 top level 257 path @/var/spool ID 286 gen 196428 top level 257 path @/var/tmp # btrfs qgroup show -pcreFf /var/lib/etcd qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/272 25.14GiB 25.14GiB none none --- --- # du -sh /var/lib/etcd/ 417M /var/lib/etcd/ ? On one worker # btrfs fi usage / Overall: Device size: 30.00GiB Device allocated: 30.00GiB Device unallocated: 1.00MiB Device missing: 0.00B Used: 27.94GiB Free (estimated): 135.28MiB (min: 135.28MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:27.99GiB, Used:27.86GiB /dev/vda6 27.99GiB Metadata,DUP: Size:1.00GiB, Used:43.44MiB /dev/vda6 2.00GiB System,DUP: Size:8.00MiB, Used:16.00KiB /dev/vda6 16.00MiB Unallocated: /dev/vda6 1.00MiB # btrfs fi df / Data, single: total=27.99GiB, used=27.86GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=43.44MiB GlobalReserve, single: total=16.00MiB, used=0.00B # btrfs fi show / Label: none uuid: 1d7b76f8-f91c-47e0-8be2-a3f02f90ac96 Total devices 1 FS bytes used 27.90GiB devid 1 size 30.00GiB used 30.00GiB path /dev/vda6 # btrfs qgroup show -pcreFf /var/lib/etcd qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/272 20.99GiB 20.99GiB none none --- --- # du -sh /var/lib/etcd/ 452M /var/lib/etcd/ Regards, [cid:image001.gif at 01D353CE.6824B1C0] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2430 bytes Desc: image001.gif URL: From paul.gonin at suse.com Thu Nov 2 04:55:05 2017 From: paul.gonin at suse.com (Paul Gonin) Date: Thu, 02 Nov 2017 11:55:05 +0100 Subject: [caasp-beta] BTRFS space and quota In-Reply-To: References: <1509456813.6297.45.camel@suse.com> <1509616524.25056.2.camel@suse.com> Message-ID: <1509620105.25687.1.camel@suse.com> Hi St?phane, Thanks, we're looking into reproducing this issue. rgdsPaul Le jeudi 02 novembre 2017 ? 10:35 +0000, Le Bihan St?phane (AMUNDI-ITS) a ?crit : > Hi Paul, > > The result of command snapper ls. > > # snapper ls > Type | # | Pre # | Date | User | Cleanup > | Description | Userdata > -------+---+-------+---------------------------------+------+------ > ---+-----------------------+--------- > single | 0 | | | root | > | current | > single | 1 | | Fri 06 Oct 2017 08:47:14 AM UTC | root | > | first root filesystem | > > I delete quota on /var/lb/etcd, and test balance but it?s not ok. > I recreate quota and rescan and value is same before deletion. > > For information I launch du ?sh on / and result is 7.8Go. > > # du -sh / > du: cannot access '/proc/7982/task/7982/fd/4': No such file or > directory > du: cannot access '/proc/7982/task/7982/fdinfo/4': No such file or > directory > du: cannot access '/proc/7982/fd/3': No such file or directory > du: cannot access '/proc/7982/fdinfo/3': No such file or directory > 7.8G / > > Regards, > > > > > > > > > > > > > St?phane Le Bihan > > > > > SDE/DSI/IPR/SSD/UNX > > > > > 90, Boulevard Pasteur - 75015 Paris > > > > > Web: > http://www.amundi.com > > > > > T?l: +33 1 76 32 32 08 > Equipe Unix : +33 1 76 32 02 30 > > > > > @: > stephane.lebihan at amundi.com > @ : > sits.unix at amundi.com > > > > > > > > > > De : Paul Gonin [mailto:paul.gonin at suse.com] > > > Envoy? : jeudi 2 novembre 2017 10:55 > > ? : Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com > > Objet : Re: [caasp-beta] BTRFS space and quota > > > > > Hi Stephane, > > > > > > What is the output of > > > > # snapper ls > > > ? > > > > > > I assume that since you there were no updates yet it should look like > > > > > > Type | # | Pre # | Date | User | Cleanup | > Description | Userdata > > > -------+---+-------+--------------------------+------+---------+----- > ------------------+-------------- > > > single | 0 | | | root | | > current | > > > single | 1 | | Tue Oct 31 09:07:13 2017 | root | | > first root filesystem | > > > single | 2 | | Tue Oct 31 09:10:42 2017 | root | number | > after installation | important=yes > > > > > > rgds > > > Paul > > > > > > Le mardi 31 octobre 2017 ? 13:38 +0000, Le Bihan St?phane (AMUNDI- > ITS) a ?crit : > > > Hi Paul, > > > > We work with CaaSP2. > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > St?phane Le Bihan > > > > > > > > > > SDE/DSI/IPR/SSD/UNX > > > > > > > > > > 90, Boulevard Pasteur - 75015 Paris > > > > > > > > > > Web: > > http://www.amundi.com > > > > > > > > > > T?l: +33 1 76 32 32 08 > > Equipe Unix : +33 1 76 32 02 30 > > > > > > > > > > @: > > stephane.lebihan at amundi.com > > @ : > > sits.unix at amundi.com > > > > > > > > > > > > > > > > > > > > De : Paul Gonin [mailto:paul.gonin at suse.com] > > > > > > Envoy? : mardi 31 octobre 2017 14:34 > > > > ? : Le Bihan St?phane (AMUNDI-ITS); > > caasp-beta at lists.suse.com > > > > Objet : Re: [caasp-beta] BTRFS space and quota > > > > > > > > > > Hi St?phane, > > > > > > > > > > > > Not that it should make a difference for the issue described, what > > version of CaaSP the cluster is running ? > > > > > > Is it CaaSP2 ? RC1 ? > > > > > > > > > > > > thanks > > > > > > Paul > > > > > > > > > > > > Le mardi 31 octobre 2017 ? 08:35 +0000, Le Bihan St?phane (AMUNDI- > > ITS) a ?crit : > > > > > Hello, > > > > > > We have a strange case on CAASP plateform with btrfs quota. > > > > > > For history, I was out of office since 3 weeks, but others > > > colleague test kubernetes plateform. > > > When I return, we ask me because FS is full on master and worker > > > nodes. > > > I don?t have cause, but I think with a bad config, subvolume > > > /var/lib/etcd grown and after correction reduce, though quota > > > reserved all space. > > > > > > When I check, I see btrfs usage and it?s really full, but balance > > > as no effect. > > > After search I see quota is activate, and subvolumes > > > /var/lib/etcd reserved 90% of space. But I don?t succeed to > > > release this space. > > > > > > Can you help me for release space disk ? > > > > > > ? > > > On master : > > > > > > # btrfs filesystem usage / > > > Overall: > > > Device size: 30.00GiB > > > Device allocated: 29.99GiB > > > Device unallocated: 17.00MiB > > > Device missing: 0.00B > > > Used: 27.56GiB > > > Free (estimated): 504.93MiB (min: 496.43MiB) > > > Data ratio: 1.00 > > > Metadata ratio: 2.00 > > > Global reserve: 16.00MiB (used: 0.00B) > > > > > > Data,single: Size:27.97GiB, Used:27.49GiB > > > /dev/vda6 27.97GiB > > > > > > Metadata,DUP: Size:1.00GiB, Used:32.64MiB > > > /dev/vda6 2.00GiB > > > > > > System,DUP: Size:9.50MiB, Used:16.00KiB > > > /dev/vda6 19.00MiB > > > > > > Unallocated: > > > /dev/vda6 17.00MiB > > > > > > # btrfs fi df / > > > Data, single: total=27.97GiB, used=27.50GiB > > > System, DUP: total=9.50MiB, used=16.00KiB > > > Metadata, DUP: total=1.00GiB, used=32.66MiB > > > GlobalReserve, single: total=16.00MiB, used=0.00B > > > > > > # btrfs fi show / > > > Label: none uuid: 1b0614eb-fc59-4841-bbc5-5318087f6432 > > > Total devices 1 FS bytes used 27.53GiB > > > devid 1 size 30.00GiB used 29.99GiB path /dev/vda6 > > > > > > # btrfs subvolume list / > > > ID 257 gen 40 top level 5 path @ > > > ID 258 gen 194820 top level 257 path @/.snapshots > > > ID 259 gen 197128 top level 258 path @/.snapshots/1/snapshot > > > ID 260 gen 194810 top level 257 path @/boot/grub2/i386-pc > > > ID 261 gen 194810 top level 257 path @/boot/grub2/x86_64-efi > > > ID 262 gen 194810 top level 257 path @/cloud-init-config > > > ID 263 gen 194810 top level 257 path @/home > > > ID 264 gen 197081 top level 257 path @/root > > > ID 265 gen 197111 top level 257 path @/tmp > > > ID 266 gen 194809 top level 257 path @/var/cache > > > ID 267 gen 194809 top level 257 path @/var/crash > > > ID 268 gen 195783 top level 257 path @/var/lib/ca-certificates > > > ID 269 gen 195783 top level 257 path @/var/lib/cloud > > > ID 270 gen 24 top level 257 path @/var/lib/docker > > > ID 271 gen 194810 top level 257 path @/var/lib/dockershim > > > ID 272 gen 195719 top level 257 path @/var/lib/etcd > > > ID 273 gen 194810 top level 257 path @/var/lib/kubelet > > > ID 274 gen 194810 top level 257 path @/var/lib/machines > > > ID 275 gen 196430 top level 257 path @/var/lib/misc > > > ID 276 gen 194810 top level 257 path @/var/lib/mysql > > > ID 277 gen 194810 top level 257 path @/var/lib/nfs > > > ID 278 gen 194810 top level 257 path @/var/lib/ntp > > > ID 279 gen 196428 top level 257 path @/var/lib/overlay > > > ID 280 gen 194810 top level 257 path @/var/lib/rollback > > > ID 281 gen 196427 top level 257 path @/var/lib/systemd > > > ID 282 gen 194810 top level 257 path @/var/lib/vmware > > > ID 283 gen 194810 top level 257 path @/var/lib/wicked > > > ID 284 gen 197128 top level 257 path @/var/log > > > ID 285 gen 197111 top level 257 path @/var/spool > > > ID 286 gen 196428 top level 257 path @/var/tmp > > > > > > # btrfs qgroup show -pcreFf /var/lib/etcd > > > qgroupid rfer excl max_rfer max_excl > > > parent child > > > -------- ---- ---- -------- -------- ---- > > > -- ----- > > > 0/272 25.14GiB 25.14GiB none none - > > > -- --- > > > > > > # du -sh /var/lib/etcd/ > > > 417M /var/lib/etcd/ > > > > > > ? > > > On one worker > > > > > > # btrfs fi usage / > > > Overall: > > > Device size: 30.00GiB > > > Device allocated: 30.00GiB > > > Device unallocated: 1.00MiB > > > Device missing: 0.00B > > > Used: 27.94GiB > > > Free (estimated): 135.28MiB (min: 135.28MiB) > > > Data ratio: 1.00 > > > Metadata ratio: 2.00 > > > Global reserve: 16.00MiB (used: 0.00B) > > > > > > Data,single: Size:27.99GiB, Used:27.86GiB > > > /dev/vda6 27.99GiB > > > > > > Metadata,DUP: Size:1.00GiB, Used:43.44MiB > > > /dev/vda6 2.00GiB > > > > > > System,DUP: Size:8.00MiB, Used:16.00KiB > > > /dev/vda6 16.00MiB > > > > > > Unallocated: > > > /dev/vda6 1.00MiB > > > > > > # btrfs fi df / > > > Data, single: total=27.99GiB, used=27.86GiB > > > System, DUP: total=8.00MiB, used=16.00KiB > > > Metadata, DUP: total=1.00GiB, used=43.44MiB > > > GlobalReserve, single: total=16.00MiB, used=0.00B > > > > > > # btrfs fi show / > > > Label: none uuid: 1d7b76f8-f91c-47e0-8be2-a3f02f90ac96 > > > Total devices 1 FS bytes used 27.90GiB > > > devid 1 size 30.00GiB used 30.00GiB path /dev/vda6 > > > > > > # btrfs qgroup show -pcreFf /var/lib/etcd > > > qgroupid rfer excl max_rfer max_excl > > > parent child > > > -------- ---- ---- -------- -------- ---- > > > -- ----- > > > 0/272 20.99GiB 20.99GiB none none - > > > -- --- > > > > > > # du -sh /var/lib/etcd/ > > > 452M /var/lib/etcd/ > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > St?phane Le Bihan > > > > > > > > > > > > > > > SDE/DSI/IPR/SSD/UNX > > > > > > > > > > > > > > > 90, Boulevard Pasteur - 75015 Paris > > > > > > > > > > > > > > > Web: > > > http://www.amundi.com > > > > > > > > > > > > > > > T?l: +33 1 76 32 32 08 > > > Equipe Unix : +33 1 76 32 02 30 > > > > > > > > > > > > > > > @: > > > stephane.lebihan at amundi.com > > > @ : > > > sits.unix at amundi.com > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > caasp-beta mailing list > > > caasp-beta at lists.suse.com > > > http://lists.suse.com/mailman/listinfo/caasp-beta > > > > _______________________________________________ > > caasp-beta mailing list > > caasp-beta at lists.suse.com > > http://lists.suse.com/mailman/listinfo/caasp-beta > > > > > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2430 bytes Desc: not available URL: From ludovic.cavajani at suse.com Thu Nov 2 09:46:53 2017 From: ludovic.cavajani at suse.com (Ludovic Cavajani) Date: Thu, 2 Nov 2017 16:46:53 +0100 Subject: [caasp-beta] BTRFS space and quota] In-Reply-To: <1509620069.25687.0.camel@suse.com> References: <1509620069.25687.0.camel@suse.com> Message-ID: <76cfab6f-a91a-ebcb-41ca-2ee364894c6f@suse.com> Hello St?phane, Can you provide us the output of : # du -csh /* Regards, On 11/02/2017 11:54 AM, Paul Gonin wrote: > -------- Message transf?r? -------- > > *Date*: Thu, 2 Nov 2017 10:35:13 +0000 > *Objet*: Re: [caasp-beta] BTRFS space and quota > *?*: Paul Gonin >, > caasp-beta at lists.suse.com > > *De*: Le Bihan St?phane (AMUNDI-ITS) > > > Hi Paul, > > ? > > The result of command snapper ls. > > ? > > # snapper ls > > Type?? | # | Pre # | Date??????????????????????????? | User | Cleanup > | Description?????????? | Userdata > > -------+---+-------+---------------------------------+------+---------+-----------------------+--------- > > single | 0 |?????? |????????????????????????? ???????| root |???????? > | current?????????????? | > > single | 1 |?????? | Fri 06 Oct 2017 08:47:14 AM UTC | root |???????? > | first root filesystem | > > ? > > I delete quota on /var/lb/etcd, and test balance but it?s not ok. > > I recreate quota and rescan and value is same before deletion. > > ? > > For information I launch du ?sh on / and result is 7.8Go. > > ? > > # du -sh / > > du: cannot access '/proc/7982/task/7982/fd/4': No such file or directory > > du: cannot access '/proc/7982/task/7982/fdinfo/4': No such file or > directory > > du: cannot access '/proc/7982/fd/3': No such file or directory > > du: cannot access '/proc/7982/fdinfo/3': No such file or directory > > 7.8G??? / > > ? > > Regards, > > ? > > ? > > *St?phane?Le Bihan?* > > SDE/DSI/IPR/SSD/UNX > > 90, Boulevard Pasteur - 75015 Paris > > *Web: http://www.amundi.com * > > T?l: +33 1 76 32 32 08 > > Equipe Unix : +33 1 76 32 02 30 > > @: stephane.lebihan at amundi.com > > @?: sits.unix at amundi.com > > ? > > ? > > *De?:*Paul Gonin [mailto:paul.gonin at suse.com] > *Envoy??:* jeudi 2 novembre 2017 10:55 > *??:* Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com > *Objet?:* Re: [caasp-beta] BTRFS space and quota > > ? > > Hi Stephane, > > ? > > What is the output of > > # snapper ls > > ? > > ? > > I assume that since you there were no updates yet it should look like > > ? > > Type???| # | Pre # | Date?????????????????????| User | Cleanup | > Description???????????| Userdata????? > > -------+---+-------+--------------------------+------+---------+-----------------------+-------------- > > single | 0 |???????|??????????????????????????| root |?????????| > current???????????????|?????????????? > > single | 1 |???????| Tue Oct 31 09:07:13 2017 | root |?????????| first > root filesystem |?????????????? > > single | 2 |???????| Tue Oct 31 09:10:42 2017 | root | number??| after > installation????| important=yes > > ? > > rgds > > Paul > > ? > > Le mardi 31 octobre 2017 ? 13:38 +0000, Le Bihan St?phane (AMUNDI-ITS) > a ?crit?: > >> Hi Paul, >> >> ? >> >> We work with CaaSP2. >> >> ? >> >> Regards, >> >> ? >> >> *St?phane?Le Bihan?* >> >> SDE/DSI/IPR/SSD/UNX >> >> 90, Boulevard Pasteur - 75015 Paris >> >> *Web: http://www.amundi.com * >> >> T?l: +33 1 76 32 32 08 >> >> Equipe Unix : +33 1 76 32 02 30 >> >> @: stephane.lebihan at amundi.com >> >> @?: sits.unix at amundi.com >> >> ? >> >> ? >> >> *De?:*Paul Gonin [mailto:paul.gonin at suse.com] >> *Envoy??:* mardi 31 octobre 2017 14:34 >> *??:* Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com >> >> *Objet?:* Re: [caasp-beta] BTRFS space and quota >> >> ? >> >> Hi St?phane, >> >> ? >> >> Not that it should make a difference for the issue described, what >> version of CaaSP the cluster is running ? >> >> Is it CaaSP2 ? RC1 ? >> >> ? >> >> thanks >> >> Paul >> >> ? >> >> Le mardi 31 octobre 2017 ? 08:35 +0000, Le Bihan St?phane >> (AMUNDI-ITS) a ?crit?: >> >>> Hello, >>> >>> ? >>> >>> We have a strange case on CAASP plateform with btrfs quota. >>> >>> ? >>> >>> For history, I was out of office since 3 weeks, but others colleague >>> test kubernetes plateform. >>> >>> When I return, we ask me because FS is full on master and worker nodes. >>> >>> I don?t have cause, but I think with a bad config, subvolume >>> /var/lib/etcd grown and ?after correction reduce, though quota >>> reserved all space. >>> >>> ? >>> >>> When I check, I see btrfs usage and it?s really full, but balance as >>> no effect. >>> >>> After search I see quota is activate, and subvolumes /var/lib/etcd >>> reserved 90% of space. But I don?t succeed to release this space. >>> >>> ? >>> >>> Can you help me for release space disk ? >>> >>> ? >>> >>> ????????? On master : >>> >>> ? >>> >>> *# btrfs filesystem usage /* >>> >>> Overall: >>> >>> ??? Device size:????????????????? 30.00GiB >>> >>> ??? Device allocated:???????????? 29.99GiB >>> >>> ??? Device unallocated:???? ??????17.00MiB >>> >>> ??? Device missing:????????????????? 0.00B >>> >>> ??? Used:???????????????????????? 27.56GiB >>> >>> ??? Free (estimated):??????????? 504.93MiB????? (min: 496.43MiB) >>> >>> ??? Data ratio:?????????????????????? 1.00 >>> >>> ??? Metadata ratio:?????????????????? 2.00 >>> >>> ??? Global reserve:?????????????? 16.00MiB????? (used: 0.00B) >>> >>> ? >>> >>> Data,single: Size:27.97GiB, Used:27.49GiB >>> >>> ?? /dev/vda6????? 27.97GiB >>> >>> ? >>> >>> Metadata,DUP: Size:1.00GiB, Used:32.64MiB >>> >>> ?? /dev/vda6?????? 2.00GiB >>> >>> ? >>> >>> System,DUP: Size:9.50MiB, Used:16.00KiB >>> >>> ?? /dev/vda6???? ?19.00MiB >>> >>> ? >>> >>> Unallocated: >>> >>> ?? /dev/vda6????? 17.00MiB >>> >>> ? >>> >>> *# btrfs fi df /* >>> >>> Data, single: total=27.97GiB, used=27.50GiB >>> >>> System, DUP: total=9.50MiB, used=16.00KiB >>> >>> Metadata, DUP: total=1.00GiB, used=32.66MiB >>> >>> GlobalReserve, single: total=16.00MiB, used=0.00B >>> >>> ? >>> >>> *# btrfs fi show /* >>> >>> Label: none? uuid: 1b0614eb-fc59-4841-bbc5-5318087f6432 >>> >>> ??????? Total devices 1 FS bytes used 27.53GiB >>> >>> ??????? devid??? 1 size 30.00GiB used 29.99GiB path /dev/vda6 >>> >>> ? >>> >>> *# btrfs subvolume list /* >>> >>> ID 257 gen 40 top level 5 path @ >>> >>> ID 258 gen 194820 top level 257 path @/.snapshots >>> >>> ID 259 gen 197128 top level 258 path @/.snapshots/1/snapshot >>> >>> ID 260 gen 194810 top level 257 path @/boot/grub2/i386-pc >>> >>> ID 261 gen 194810 top level 257 path @/boot/grub2/x86_64-efi >>> >>> ID 262 gen 194810 top level 257 path @/cloud-init-config >>> >>> ID 263 gen 194810 top level 257 path @/home >>> >>> ID 264 gen 197081 top level 257 path @/root >>> >>> ID 265 gen 197111 top level 257 path @/tmp >>> >>> ID 266 gen 194809 top level 257 path @/var/cache >>> >>> ID 267 gen 194809 top level 257 path @/var/crash >>> >>> ID 268 gen 195783 top level 257 path @/var/lib/ca-certificates >>> >>> ID 269 gen 195783 top level 257 path @/var/lib/cloud >>> >>> ID 270 gen 24 top level 257 path @/var/lib/docker >>> >>> ID 271 gen 194810 top level 257 path @/var/lib/dockershim >>> >>> ID 272 gen 195719 top level 257 path @/var/lib/etcd >>> >>> ID 273 gen 194810 top level 257 path @/var/lib/kubelet >>> >>> ID 274 gen 194810 top level 257 path @/var/lib/machines >>> >>> ID 275 gen 196430 top level 257 path @/var/lib/misc >>> >>> ID 276 gen 194810 top level 257 path @/var/lib/mysql >>> >>> ID 277 gen 194810 top level 257 path @/var/lib/nfs >>> >>> ID 278 gen 194810 top level 257 path @/var/lib/ntp >>> >>> ID 279 gen 196428 top level 257 path @/var/lib/overlay >>> >>> ID 280 gen 194810 top level 257 path @/var/lib/rollback >>> >>> ID 281 gen 196427 top level 257 path @/var/lib/systemd >>> >>> ID 282 gen 194810 top level 257 path @/var/lib/vmware >>> >>> ID 283 gen 194810 top level 257 path @/var/lib/wicked >>> >>> ID 284 gen 197128 top level 257 path @/var/log >>> >>> ID 285 gen 197111 top level 257 path @/var/spool >>> >>> ID 286 gen 196428 top level 257 path @/var/tmp >>> >>> ? >>> >>> *# btrfs qgroup show -pcreFf /var/lib/etcd* >>> >>> qgroupid???????? rfer???????? excl???? max_rfer???? max_excl parent? >>> child >>> >>> --------???????? ----???????? ----???? --------???? -------- ------? >>> ----- >>> >>> 0/272??????? 25.14GiB???? 25.14GiB???????? none???????? none ---???? --- >>> >>> ? >>> >>> *# du -sh /var/lib/etcd/* >>> >>> 417M??? /var/lib/etcd/ >>> >>> ? >>> >>> ????????? On one worker >>> >>> ? >>> >>> *# btrfs fi usage /* >>> >>> Overall: >>> >>> ??? Device size:????????????????? 30.00GiB >>> >>> ??? Device allocated:???????????? 30.00GiB >>> >>> ??? Device unallocated:??????????? 1.00MiB >>> >>> ??? Device missing:????????????????? 0.00B >>> >>> ??? Used:???????????????????????? 27.94GiB >>> >>> ??? Free (estimated):??????????? 135.28MiB????? (min: 135.28MiB) >>> >>> ??? Data ratio:?????????????????????? 1.00 >>> >>> ??? Metadata ratio:?????????????????? 2.00 >>> >>> ??? Global reserve:?????????????? 16.00MiB????? (used: 0.00B) >>> >>> ? >>> >>> Data,single: Size:27.99GiB, Used:27.86GiB >>> >>> ?? /dev/vda6????? 27.99GiB >>> >>> ? >>> >>> Metadata,DUP: Size:1.00GiB, Used:43.44MiB >>> >>> ?? /dev/vda6?????? 2.00GiB >>> >>> ? >>> >>> System,DUP: Size:8.00MiB, Used:16.00KiB >>> >>> ?? /dev/vda6????? 16.00MiB >>> >>> ? >>> >>> Unallocated: >>> >>> ?? /dev/vda6?????? 1.00MiB >>> >>> ? >>> >>> *# btrfs fi df /* >>> >>> Data, single: total=27.99GiB, used=27.86GiB >>> >>> System, DUP: total=8.00MiB, used=16.00KiB >>> >>> Metadata, DUP: total=1.00GiB, used=43.44MiB >>> >>> GlobalReserve, single: total=16.00MiB, used=0.00B >>> >>> ? >>> >>> *# btrfs fi show /* >>> >>> Label: none? uuid: 1d7b76f8-f91c-47e0-8be2-a3f02f90ac96 >>> >>> ??????? Total devices 1 FS bytes used 27.90GiB >>> >>> ??????? devid??? 1 size 30.00GiB used 30.00GiB path /dev/vda6 >>> >>> ? >>> >>> *# btrfs qgroup show -pcreFf /var/lib/etcd* >>> >>> qgroupid???????? rfer???????? excl???? max_rfer???? max_excl parent? >>> child >>> >>> --------???????? ----???????? ----???? --------???? -------- ------? >>> ----- >>> >>> 0/272??????? 20.99GiB???? 20.99GiB???????? none???????? none ---???? --- >>> >>> ? >>> >>> *# du -sh /var/lib/etcd/* >>> >>> 452M??? /var/lib/etcd/ >>> >>> ? >>> >>> Regards, >>> >>> ? >>> >>> *St?phane?Le Bihan?* >>> >>> SDE/DSI/IPR/SSD/UNX >>> >>> 90, Boulevard Pasteur - 75015 Paris >>> >>> *Web: http://www.amundi.com * >>> >>> T?l: +33 1 76 32 32 08 >>> >>> Equipe Unix : +33 1 76 32 02 30 >>> >>> @: stephane.lebihan at amundi.com >>> >>> @?: sits.unix at amundi.com >>> >>> ? >>> >>> ? >>> >>> _______________________________________________ >>> caasp-beta mailing list >>> caasp-beta at lists.suse.com >>> http://lists.suse.com/mailman/listinfo/caasp-beta >> _______________________________________________ >> caasp-beta mailing list >> caasp-beta at lists.suse.com >> http://lists.suse.com/mailman/listinfo/caasp-beta > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2430 bytes Desc: not available URL: From stephane.lebihan at amundi.com Fri Nov 3 02:48:44 2017 From: stephane.lebihan at amundi.com (=?utf-8?B?TGUgQmloYW4gU3TDqXBoYW5lIChBTVVOREktSVRTKQ==?=) Date: Fri, 3 Nov 2017 08:48:44 +0000 Subject: [caasp-beta] BTRFS space and quota] In-Reply-To: <76cfab6f-a91a-ebcb-41ca-2ee364894c6f@suse.com> References: <1509620069.25687.0.camel@suse.com> <76cfab6f-a91a-ebcb-41ca-2ee364894c6f@suse.com> Message-ID: Hello Ludovic, I can provide us result now, but we success to restore free space yesterday. And I think we find cause. For restore free space we have stop etcd.service, remove all file in /var/lib/etcd, and restart etcd.service. # systemctl stop etcd # rm ?rf /etc/sysconfig/etcd/member # systemctl start etcd # du -csh /* 4.6M /bin 44M /boot 0 /cloud-init-config 8.0K /dev 12M /etc 0 /home 318M /lib 14M /lib64 0 /mnt 0 /opt du: cannot access '/proc/24205/task/24205/fd/4': No such file or directory du: cannot access '/proc/24205/task/24205/fdinfo/4': No such file or directory du: cannot access '/proc/24205/fd/4': No such file or directory du: cannot access '/proc/24205/fdinfo/4': No such file or directory 0 /proc 3.4M /root 218M /run 5.7M /sbin 0 /selinux 0 /srv 0 /sys 48K /tmp 1.8G /usr 5.4G /var 7.8G total # btrfs fi usage / Overall: Device size: 30.00GiB Device allocated: 5.02GiB Device unallocated: 24.99GiB Device missing: 0.00B Used: 2.55GiB Free (estimated): 25.50GiB (min: 13.00GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:3.00GiB, Used:2.49GiB /dev/vda6 3.00GiB Metadata,DUP: Size:1.00GiB, Used:32.59MiB /dev/vda6 2.00GiB System,DUP: Size:9.50MiB, Used:16.00KiB /dev/vda6 19.00MiB Unallocated: /dev/vda6 24.99GiB Etcd seems ok, but flannel is KO. After search I discover we can?t ping all other server (in or not in CAASP) from master and worker. I connect to admin node and it?s same. So I search in history, and I found my team patch OS of hypervisor on 22-October. My architecture is based on KVM, on one physical server SLES12 SP2, but I think after upgrade of hypervisor on SLES12 SP3, virtio card of KVM don?t work correctly? # cat /etc/hosts # # hosts This file describes a number of hostname-to-address # mappings for the TCP/IP subsystem. It is mostly # used at boot time, when no name servers are running. # On small systems, this file can be used instead of a # "named" name server. # Syntax: # # IP-Address Full-Qualified-Hostname Short-Hostname # 127.0.0.1 localhost # special IPv6 addresses ::1 localhost ipv6-localhost ipv6-loopback fe00::0 ipv6-localnet ff00::0 ipv6-mcastprefix ff02::1 ipv6-allnodes ff02::2 ipv6-allrouters ff02::3 ipv6-allhosts #-- start Salt-CaaSP managed hosts - DO NOT MODIFY -- ### service names ### 127.0.0.1 api api.infra.caasp.local dev-kubm01.unix.sits.credit-agricole.fr ### admin nodes ### 10.198.47.219 admin admin.infra.caasp.local ### kubernetes masters ### 10.198.47.220 f74967034d3743f1b843d227df61c7ad f74967034d3743f1b843d227df61c7ad.infra.caasp.local ### kubernetes workers ### 10.198.47.224 82c1065b62f84a508a9e1ffeb45a5cf2 82c1065b62f84a508a9e1ffeb45a5cf2.infra.caasp.local 10.198.47.223 afbe67218e5b4807a16e84997de79c6f afbe67218e5b4807a16e84997de79c6f.infra.caasp.local 10.198.47.221 12b79838fd734263830ffeb74dbb35bb 12b79838fd734263830ffeb74dbb35bb.infra.caasp.local 10.198.47.222 d246e0d7ff5b49c0996ea10c7bb8ca43 d246e0d7ff5b49c0996ea10c7bb8ca43.infra.caasp.local #-- end Salt-CaaSP managed hosts -- # ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:49:ee:13 brd ff:ff:ff:ff:ff:ff inet 10.198.47.220/24 brd 10.198.47.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe49:ee13/64 scope link valid_lft forever preferred_lft forever # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.198.47.253 0.0.0.0 UG 0 0 0 eth0 10.198.47.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 # ping 10.198.47.219 PING 10.198.47.219 (10.198.47.219) 56(84) bytes of data. ^C --- 10.198.47.219 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 3999ms # ping 10.198.47.221 PING 10.198.47.221 (10.198.47.221) 56(84) bytes of data. ^C --- 10.198.47.221 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2014ms # ping 10.198.47.253 PING 10.198.47.253 (10.198.47.253) 56(84) bytes of data. ^C --- 10.198.47.253 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 4002ms Regards, [cid:image001.gif at 01D3547B.A5D70880] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Ludovic Cavajani [mailto:ludovic.cavajani at suse.com] Envoy? : jeudi 2 novembre 2017 16:47 ? : Paul Gonin; caasp-beta at lists.suse.com; Le Bihan St?phane (AMUNDI-ITS) Objet : Re: [caasp-beta] BTRFS space and quota] Hello St?phane, Can you provide us the output of : # du -csh /* Regards, On 11/02/2017 11:54 AM, Paul Gonin wrote: -------- Message transf?r? -------- Date: Thu, 2 Nov 2017 10:35:13 +0000 Objet: Re: [caasp-beta] BTRFS space and quota ?: Paul Gonin >, caasp-beta at lists.suse.com > De: Le Bihan St?phane (AMUNDI-ITS) > Hi Paul, The result of command snapper ls. # snapper ls Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+---------------------------------+------+---------+-----------------------+--------- single | 0 | | | root | | current | single | 1 | | Fri 06 Oct 2017 08:47:14 AM UTC | root | | first root filesystem | I delete quota on /var/lb/etcd, and test balance but it?s not ok. I recreate quota and rescan and value is same before deletion. For information I launch du ?sh on / and result is 7.8Go. # du -sh / du: cannot access '/proc/7982/task/7982/fd/4': No such file or directory du: cannot access '/proc/7982/task/7982/fdinfo/4': No such file or directory du: cannot access '/proc/7982/fd/3': No such file or directory du: cannot access '/proc/7982/fdinfo/3': No such file or directory 7.8G / Regards, [cid:image001.gif at 01D3547B.A5D70880] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Paul Gonin [mailto:paul.gonin at suse.com] Envoy? : jeudi 2 novembre 2017 10:55 ? : Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com Objet : Re: [caasp-beta] BTRFS space and quota Hi Stephane, What is the output of # snapper ls ? I assume that since you there were no updates yet it should look like Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+--------------------------+------+---------+-----------------------+-------------- single | 0 | | | root | | current | single | 1 | | Tue Oct 31 09:07:13 2017 | root | | first root filesystem | single | 2 | | Tue Oct 31 09:10:42 2017 | root | number | after installation | important=yes rgds Paul Le mardi 31 octobre 2017 ? 13:38 +0000, Le Bihan St?phane (AMUNDI-ITS) a ?crit : Hi Paul, We work with CaaSP2. Regards, [cid:image001.gif at 01D3547B.A5D70880] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Paul Gonin [mailto:paul.gonin at suse.com] Envoy? : mardi 31 octobre 2017 14:34 ? : Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com Objet : Re: [caasp-beta] BTRFS space and quota Hi St?phane, Not that it should make a difference for the issue described, what version of CaaSP the cluster is running ? Is it CaaSP2 ? RC1 ? thanks Paul Le mardi 31 octobre 2017 ? 08:35 +0000, Le Bihan St?phane (AMUNDI-ITS) a ?crit : Hello, We have a strange case on CAASP plateform with btrfs quota. For history, I was out of office since 3 weeks, but others colleague test kubernetes plateform. When I return, we ask me because FS is full on master and worker nodes. I don?t have cause, but I think with a bad config, subvolume /var/lib/etcd grown and after correction reduce, though quota reserved all space. When I check, I see btrfs usage and it?s really full, but balance as no effect. After search I see quota is activate, and subvolumes /var/lib/etcd reserved 90% of space. But I don?t succeed to release this space. Can you help me for release space disk ? ? On master : # btrfs filesystem usage / Overall: Device size: 30.00GiB Device allocated: 29.99GiB Device unallocated: 17.00MiB Device missing: 0.00B Used: 27.56GiB Free (estimated): 504.93MiB (min: 496.43MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:27.97GiB, Used:27.49GiB /dev/vda6 27.97GiB Metadata,DUP: Size:1.00GiB, Used:32.64MiB /dev/vda6 2.00GiB System,DUP: Size:9.50MiB, Used:16.00KiB /dev/vda6 19.00MiB Unallocated: /dev/vda6 17.00MiB # btrfs fi df / Data, single: total=27.97GiB, used=27.50GiB System, DUP: total=9.50MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=32.66MiB GlobalReserve, single: total=16.00MiB, used=0.00B # btrfs fi show / Label: none uuid: 1b0614eb-fc59-4841-bbc5-5318087f6432 Total devices 1 FS bytes used 27.53GiB devid 1 size 30.00GiB used 29.99GiB path /dev/vda6 # btrfs subvolume list / ID 257 gen 40 top level 5 path @ ID 258 gen 194820 top level 257 path @/.snapshots ID 259 gen 197128 top level 258 path @/.snapshots/1/snapshot ID 260 gen 194810 top level 257 path @/boot/grub2/i386-pc ID 261 gen 194810 top level 257 path @/boot/grub2/x86_64-efi ID 262 gen 194810 top level 257 path @/cloud-init-config ID 263 gen 194810 top level 257 path @/home ID 264 gen 197081 top level 257 path @/root ID 265 gen 197111 top level 257 path @/tmp ID 266 gen 194809 top level 257 path @/var/cache ID 267 gen 194809 top level 257 path @/var/crash ID 268 gen 195783 top level 257 path @/var/lib/ca-certificates ID 269 gen 195783 top level 257 path @/var/lib/cloud ID 270 gen 24 top level 257 path @/var/lib/docker ID 271 gen 194810 top level 257 path @/var/lib/dockershim ID 272 gen 195719 top level 257 path @/var/lib/etcd ID 273 gen 194810 top level 257 path @/var/lib/kubelet ID 274 gen 194810 top level 257 path @/var/lib/machines ID 275 gen 196430 top level 257 path @/var/lib/misc ID 276 gen 194810 top level 257 path @/var/lib/mysql ID 277 gen 194810 top level 257 path @/var/lib/nfs ID 278 gen 194810 top level 257 path @/var/lib/ntp ID 279 gen 196428 top level 257 path @/var/lib/overlay ID 280 gen 194810 top level 257 path @/var/lib/rollback ID 281 gen 196427 top level 257 path @/var/lib/systemd ID 282 gen 194810 top level 257 path @/var/lib/vmware ID 283 gen 194810 top level 257 path @/var/lib/wicked ID 284 gen 197128 top level 257 path @/var/log ID 285 gen 197111 top level 257 path @/var/spool ID 286 gen 196428 top level 257 path @/var/tmp # btrfs qgroup show -pcreFf /var/lib/etcd qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/272 25.14GiB 25.14GiB none none --- --- # du -sh /var/lib/etcd/ 417M /var/lib/etcd/ ? On one worker # btrfs fi usage / Overall: Device size: 30.00GiB Device allocated: 30.00GiB Device unallocated: 1.00MiB Device missing: 0.00B Used: 27.94GiB Free (estimated): 135.28MiB (min: 135.28MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:27.99GiB, Used:27.86GiB /dev/vda6 27.99GiB Metadata,DUP: Size:1.00GiB, Used:43.44MiB /dev/vda6 2.00GiB System,DUP: Size:8.00MiB, Used:16.00KiB /dev/vda6 16.00MiB Unallocated: /dev/vda6 1.00MiB # btrfs fi df / Data, single: total=27.99GiB, used=27.86GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=43.44MiB GlobalReserve, single: total=16.00MiB, used=0.00B # btrfs fi show / Label: none uuid: 1d7b76f8-f91c-47e0-8be2-a3f02f90ac96 Total devices 1 FS bytes used 27.90GiB devid 1 size 30.00GiB used 30.00GiB path /dev/vda6 # btrfs qgroup show -pcreFf /var/lib/etcd qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/272 20.99GiB 20.99GiB none none --- --- # du -sh /var/lib/etcd/ 452M /var/lib/etcd/ Regards, [cid:image001.gif at 01D3547B.A5D70880] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2430 bytes Desc: image001.gif URL: From special011 at gmail.com Fri Nov 3 15:43:15 2017 From: special011 at gmail.com (Jerry Hwang) Date: Fri, 3 Nov 2017 14:43:15 -0700 Subject: [caasp-beta] disk full with many snapshots Message-ID: Hi, I found etcd service fails to start and the root filesystem is full with more than 40 snapshots in /.snapshots which I was not able to delete manually. How can I free up the space and etcd start? It seems similar to [caasp-beta] BTRFS space and quota thread. Jerry -------------- next part -------------- An HTML attachment was scrubbed... URL: From kukuk at suse.com Fri Nov 3 15:50:51 2017 From: kukuk at suse.com (Thorsten Kukuk) Date: Fri, 3 Nov 2017 22:50:51 +0100 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: References: Message-ID: <20171103215051.GA10566@suse.com> On Fri, Nov 03, Jerry Hwang wrote: > Hi, > > I found etcd service fails to start and the root filesystem is full with more > than 40 snapshots in /.snapshots which I was not able to delete manually. If you have that many snapshots, it seems you ignored to update the cluster for a very, very long time? Else, why can you not delete them manually? What is the error message? Sure that this are snapshots and not subvolumes? Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) From special011 at gmail.com Fri Nov 3 17:39:51 2017 From: special011 at gmail.com (Jerry Hwang) Date: Fri, 3 Nov 2017 16:39:51 -0700 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: <20171103215051.GA10566@suse.com> References: <20171103215051.GA10566@suse.com> Message-ID: I am trying to update by running 'transactional-update' but it fails for Permission to access ' https://updates.suse.com/SUSE/Updates/SUSE-CAASP/ALL/x86_64/update/media.1/media?1LmxWFIpNtMM6TL8KaOJMA2V4F7hh9FgQ3g6-LC_8VnDw3yHgl1lYWjy6PqYeoRhJ3DNWivLrowm8AmFjy-A8dTPLn3xArwryc8Gz5RQY0Hcf7jqwTmHV8CKd4Paa-sxg3O46qVx7j4' denied. Btw, is it possible to configure to limit the number of snapshots to prevent such disk full? Thanks, Jerry On Fri, Nov 3, 2017 at 2:50 PM, Thorsten Kukuk wrote: > On Fri, Nov 03, Jerry Hwang wrote: > > > Hi, > > > > I found etcd service fails to start and the root filesystem is full with > more > > than 40 snapshots in /.snapshots which I was not able to delete manually. > > If you have that many snapshots, it seems you ignored to update the > cluster for a very, very long time? > > Else, why can you not delete them manually? What is the error message? > Sure that this are snapshots and not subvolumes? > > Thorsten > > -- > Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany > GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG > Nuernberg) > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta > -------------- next part -------------- An HTML attachment was scrubbed... URL: From special011 at gmail.com Fri Nov 3 21:18:12 2017 From: special011 at gmail.com (Jerry Hwang) Date: Fri, 3 Nov 2017 20:18:12 -0700 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: References: <20171103215051.GA10566@suse.com> Message-ID: and any way possible to force to delete some snapshots to reclaim some free space? On Fri, Nov 3, 2017 at 4:39 PM, Jerry Hwang wrote: > I am trying to update by running 'transactional-update' but it fails for > Permission to access 'https://updates.suse. > com/SUSE/Updates/SUSE-CAASP/ALL/x86_64/update/media.1/media? > 1LmxWFIpNtMM6TL8KaOJMA2V4F7hh9FgQ3g6-LC_8VnDw3yHgl1lYWjy6PqYeoRhJ3DNWi > vLrowm8AmFjy-A8dTPLn3xArwryc8Gz5RQY0Hcf7jqwTmHV8CKd4Paa-sxg3O46qVx7j4' > denied. > > Btw, is it possible to configure to limit the number of snapshots to > prevent such disk full? > > Thanks, > Jerry > > On Fri, Nov 3, 2017 at 2:50 PM, Thorsten Kukuk wrote: > >> On Fri, Nov 03, Jerry Hwang wrote: >> >> > Hi, >> > >> > I found etcd service fails to start and the root filesystem is full >> with more >> > than 40 snapshots in /.snapshots which I was not able to delete >> manually. >> >> If you have that many snapshots, it seems you ignored to update the >> cluster for a very, very long time? >> >> Else, why can you not delete them manually? What is the error message? >> Sure that this are snapshots and not subvolumes? >> >> Thorsten >> >> -- >> Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP >> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany >> GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG >> Nuernberg) >> _______________________________________________ >> caasp-beta mailing list >> caasp-beta at lists.suse.com >> http://lists.suse.com/mailman/listinfo/caasp-beta >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From special011 at gmail.com Fri Nov 3 21:26:09 2017 From: special011 at gmail.com (Jerry Hwang) Date: Fri, 3 Nov 2017 20:26:09 -0700 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: References: <20171103215051.GA10566@suse.com> Message-ID: forgot to answer your question.. I see Read-only file system error, for example rm: cannot remove '1/snapshot/opt': Read-only file system how can I verify if they are snapshots or subvolumes? btrfs subvolume list / output contains ID 258 gen 25393732 top level 257 path @/.snapshots ID 259 gen 164920 top level 258 path @/.snapshots/1/snapshot ... 2 - 48 ... ID 366 gen 145685 top level 258 path @/.snapshots/49/snapshot On Fri, Nov 3, 2017 at 8:18 PM, Jerry Hwang wrote: > and any way possible to force to delete some snapshots to reclaim some > free space? > > On Fri, Nov 3, 2017 at 4:39 PM, Jerry Hwang wrote: > >> I am trying to update by running 'transactional-update' but it fails for >> Permission to access 'https://updates.suse.com >> /SUSE/Updates/SUSE-CAASP/ALL/x86_64/update/media.1/media?1L >> mxWFIpNtMM6TL8KaOJMA2V4F7hh9FgQ3g6-LC_8VnDw3yHgl1lYWjy6PqYeo >> RhJ3DNWivLrowm8AmFjy-A8dTPLn3xArwryc8Gz5RQY0Hcf7jqwTmHV8CKd4 >> Paa-sxg3O46qVx7j4' denied. >> >> Btw, is it possible to configure to limit the number of snapshots to >> prevent such disk full? >> >> Thanks, >> Jerry >> >> On Fri, Nov 3, 2017 at 2:50 PM, Thorsten Kukuk wrote: >> >>> On Fri, Nov 03, Jerry Hwang wrote: >>> >>> > Hi, >>> > >>> > I found etcd service fails to start and the root filesystem is full >>> with more >>> > than 40 snapshots in /.snapshots which I was not able to delete >>> manually. >>> >>> If you have that many snapshots, it seems you ignored to update the >>> cluster for a very, very long time? >>> >>> Else, why can you not delete them manually? What is the error message? >>> Sure that this are snapshots and not subvolumes? >>> >>> Thorsten >>> >>> -- >>> Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP >>> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany >>> GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG >>> Nuernberg) >>> _______________________________________________ >>> caasp-beta mailing list >>> caasp-beta at lists.suse.com >>> http://lists.suse.com/mailman/listinfo/caasp-beta >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kukuk at suse.com Sat Nov 4 10:08:24 2017 From: kukuk at suse.com (Thorsten Kukuk) Date: Sat, 4 Nov 2017 17:08:24 +0100 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: References: <20171103215051.GA10566@suse.com> Message-ID: <20171104160824.GB14143@suse.com> On Fri, Nov 03, Jerry Hwang wrote: > forgot to answer your question.. I see Read-only file system error, for example > rm: cannot remove '1/snapshot/opt': Read-only file system Ok, you had big luck that we have a read-only root filesystem, else you could now re-install. Deleting the root filesystem is always a really bad idea. > how can I verify if they are snapshots or subvolumes? Snapshots are subvolumes, too, they are only created in another way. snapper is the tool to manage snapshots. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) From special011 at gmail.com Sat Nov 4 16:54:12 2017 From: special011 at gmail.com (Jerry Hwang) Date: Sat, 4 Nov 2017 15:54:12 -0700 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: <20171104160824.GB14143@suse.com> References: <20171103215051.GA10566@suse.com> <20171104160824.GB14143@suse.com> Message-ID: Thanks for the tips. It's good now. On Sat, Nov 4, 2017 at 9:08 AM, Thorsten Kukuk wrote: > On Fri, Nov 03, Jerry Hwang wrote: > > > forgot to answer your question.. I see Read-only file system error, for > example > > rm: cannot remove '1/snapshot/opt': Read-only file system > > Ok, you had big luck that we have a read-only root filesystem, else > you could now re-install. Deleting the root filesystem is always a > really bad idea. > > > how can I verify if they are snapshots or subvolumes? > > Snapshots are subvolumes, too, they are only created in another way. > snapper is the tool to manage snapshots. > > Thorsten > -- > Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany > GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG > Nuernberg) > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jevans at suse.com Mon Nov 6 02:38:56 2017 From: jevans at suse.com (Jason S. Evans) Date: Mon, 06 Nov 2017 10:38:56 +0100 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: References: <20171103215051.GA10566@suse.com> <20171104160824.GB14143@suse.com> Message-ID: <1509961136.3518.11.camel@suse.com> On Sat, 2017-11-04 at 15:54 -0700, Jerry Hwang wrote: > Thanks for the tips. It's good now. > > Hi Jerry, could you let me know which specific steps you took to fix this issue? This would make a great TID in case someone else sees this in the future. Thanks! --? Best Regards, Jason Evans, Technical Support Engineer? Linux Support Service, EMEA Services Center From special011 at gmail.com Mon Nov 6 09:59:13 2017 From: special011 at gmail.com (Jerry Hwang) Date: Mon, 6 Nov 2017 08:59:13 -0800 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: <1509961136.3518.11.camel@suse.com> References: <20171103215051.GA10566@suse.com> <20171104160824.GB14143@suse.com> <1509961136.3518.11.camel@suse.com> Message-ID: Hi Jason, I resolved the permission denied error by deregister and register the nodes. Then I got 'no enough space on disk' error for transactional update. I used snapper to remove many snapshots because the disk was full at 100 or 99%. After some disk space free, I re-ran 'transactional update', it succeeded and then rebooted the nodes. Jerry On Mon, Nov 6, 2017 at 1:38 AM, Jason S. Evans wrote: > On Sat, 2017-11-04 at 15:54 -0700, Jerry Hwang wrote: > > Thanks for the tips. It's good now. > > > > > > Hi Jerry, could you let me know which specific steps you took to fix this > issue? This would make a great TID in case someone else sees this in the > future. Thanks! > > > -- > Best Regards, > > Jason Evans, Technical Support Engineer > Linux Support Service, EMEA Services Center > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ananda at fieldday.io Mon Nov 6 11:19:03 2017 From: ananda at fieldday.io (Ananda Kammampati) Date: Mon, 6 Nov 2017 10:19:03 -0800 Subject: [caasp-beta] disk full with many snapshots In-Reply-To: References: <20171103215051.GA10566@suse.com> <20171104160824.GB14143@suse.com> <1509961136.3518.11.camel@suse.com> Message-ID: <11af1b20-d005-9e25-b382-a9cb95c6d123@fieldday.io> Hi Jason, Could you please share the commands/steps as how you deregister and register the nodes ? Appreciate if you can point me to any documentation links as well, if you have that handy. thanks, Ananda On 11/6/17 8:59 AM, Jerry Hwang wrote: > Hi Jason, > > I resolved the permission denied error by deregister and register the > nodes. > Then I got 'no enough space on disk' error for transactional update. > I used snapper to remove many snapshots because the disk was full at > 100 or 99%. > After some disk space free, I re-ran 'transactional update', it > succeeded and then rebooted the nodes. > > Jerry > > On Mon, Nov 6, 2017 at 1:38 AM, Jason S. Evans > wrote: > > On Sat, 2017-11-04 at 15:54 -0700, Jerry Hwang wrote: > > Thanks for the tips. It's good now. > > > > > > Hi Jerry, could you let me know which specific steps you took to > fix this issue? This would make a great TID in case someone else > sees this in the future. Thanks! > > > -- > Best Regards, > > Jason Evans, Technical Support Engineer > Linux Support Service, EMEA Services Center > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta > > > > > > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephane.lebihan at amundi.com Wed Nov 8 09:30:15 2017 From: stephane.lebihan at amundi.com (=?utf-8?B?TGUgQmloYW4gU3TDqXBoYW5lIChBTVVOREktSVRTKQ==?=) Date: Wed, 8 Nov 2017 16:30:15 +0000 Subject: [caasp-beta] BTRFS space and quota] References: <1509620069.25687.0.camel@suse.com> <76cfab6f-a91a-ebcb-41ca-2ee364894c6f@suse.com> Message-ID: Hi, For information, we have identify problem. As a reminder, my architecture is one hypervisor in SLES12 (SP2) and 5 KVM with CAASP2. Member of my team patch hypervisor from SLES 12 SP2 to SLES 12 SP3. After that we can?t connect to KVM with ssh, but on KVM network not work. We can?t ping gateway or any server?. I rollback hypervisor on snapshots before patch (Boot on snapshots on read-only, execute ?snapper rollback?) KVM restart and network works correctly. CAASP not working because we have delete all file in /var/lib/etcd on all workers and admin nodes. But I think problem on full FS / is a corollary to network problem after upgrade of hypervisor. Thanks for your help. Regards, [cid:image001.gif at 01D35894.8DD70740] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Le Bihan St?phane (AMUNDI-ITS) Envoy? : vendredi 3 novembre 2017 09:49 ? : 'Ludovic Cavajani'; Paul Gonin; caasp-beta at lists.suse.com Objet : RE: [caasp-beta] BTRFS space and quota] Hello Ludovic, I can provide us result now, but we success to restore free space yesterday. And I think we find cause. For restore free space we have stop etcd.service, remove all file in /var/lib/etcd, and restart etcd.service. # systemctl stop etcd # rm ?rf /etc/sysconfig/etcd/member # systemctl start etcd # du -csh /* 4.6M /bin 44M /boot 0 /cloud-init-config 8.0K /dev 12M /etc 0 /home 318M /lib 14M /lib64 0 /mnt 0 /opt du: cannot access '/proc/24205/task/24205/fd/4': No such file or directory du: cannot access '/proc/24205/task/24205/fdinfo/4': No such file or directory du: cannot access '/proc/24205/fd/4': No such file or directory du: cannot access '/proc/24205/fdinfo/4': No such file or directory 0 /proc 3.4M /root 218M /run 5.7M /sbin 0 /selinux 0 /srv 0 /sys 48K /tmp 1.8G /usr 5.4G /var 7.8G total # btrfs fi usage / Overall: Device size: 30.00GiB Device allocated: 5.02GiB Device unallocated: 24.99GiB Device missing: 0.00B Used: 2.55GiB Free (estimated): 25.50GiB (min: 13.00GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:3.00GiB, Used:2.49GiB /dev/vda6 3.00GiB Metadata,DUP: Size:1.00GiB, Used:32.59MiB /dev/vda6 2.00GiB System,DUP: Size:9.50MiB, Used:16.00KiB /dev/vda6 19.00MiB Unallocated: /dev/vda6 24.99GiB Etcd seems ok, but flannel is KO. After search I discover we can?t ping all other server (in or not in CAASP) from master and worker. I connect to admin node and it?s same. So I search in history, and I found my team patch OS of hypervisor on 22-October. My architecture is based on KVM, on one physical server SLES12 SP2, but I think after upgrade of hypervisor on SLES12 SP3, virtio card of KVM don?t work correctly? # cat /etc/hosts # # hosts This file describes a number of hostname-to-address # mappings for the TCP/IP subsystem. It is mostly # used at boot time, when no name servers are running. # On small systems, this file can be used instead of a # "named" name server. # Syntax: # # IP-Address Full-Qualified-Hostname Short-Hostname # 127.0.0.1 localhost # special IPv6 addresses ::1 localhost ipv6-localhost ipv6-loopback fe00::0 ipv6-localnet ff00::0 ipv6-mcastprefix ff02::1 ipv6-allnodes ff02::2 ipv6-allrouters ff02::3 ipv6-allhosts #-- start Salt-CaaSP managed hosts - DO NOT MODIFY -- ### service names ### 127.0.0.1 api api.infra.caasp.local dev-kubm01.unix.sits.credit-agricole.fr ### admin nodes ### 10.198.47.219 admin admin.infra.caasp.local ### kubernetes masters ### 10.198.47.220 f74967034d3743f1b843d227df61c7ad f74967034d3743f1b843d227df61c7ad.infra.caasp.local ### kubernetes workers ### 10.198.47.224 82c1065b62f84a508a9e1ffeb45a5cf2 82c1065b62f84a508a9e1ffeb45a5cf2.infra.caasp.local 10.198.47.223 afbe67218e5b4807a16e84997de79c6f afbe67218e5b4807a16e84997de79c6f.infra.caasp.local 10.198.47.221 12b79838fd734263830ffeb74dbb35bb 12b79838fd734263830ffeb74dbb35bb.infra.caasp.local 10.198.47.222 d246e0d7ff5b49c0996ea10c7bb8ca43 d246e0d7ff5b49c0996ea10c7bb8ca43.infra.caasp.local #-- end Salt-CaaSP managed hosts -- # ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:49:ee:13 brd ff:ff:ff:ff:ff:ff inet 10.198.47.220/24 brd 10.198.47.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe49:ee13/64 scope link valid_lft forever preferred_lft forever # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.198.47.253 0.0.0.0 UG 0 0 0 eth0 10.198.47.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 # ping 10.198.47.219 PING 10.198.47.219 (10.198.47.219) 56(84) bytes of data. ^C --- 10.198.47.219 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 3999ms # ping 10.198.47.221 PING 10.198.47.221 (10.198.47.221) 56(84) bytes of data. ^C --- 10.198.47.221 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2014ms # ping 10.198.47.253 PING 10.198.47.253 (10.198.47.253) 56(84) bytes of data. ^C --- 10.198.47.253 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 4002ms Regards, [cid:image001.gif at 01D35894.8DD70740] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Ludovic Cavajani [mailto:ludovic.cavajani at suse.com] Envoy? : jeudi 2 novembre 2017 16:47 ? : Paul Gonin; caasp-beta at lists.suse.com; Le Bihan St?phane (AMUNDI-ITS) Objet : Re: [caasp-beta] BTRFS space and quota] Hello St?phane, Can you provide us the output of : # du -csh /* Regards, On 11/02/2017 11:54 AM, Paul Gonin wrote: -------- Message transf?r? -------- Date: Thu, 2 Nov 2017 10:35:13 +0000 Objet: Re: [caasp-beta] BTRFS space and quota ?: Paul Gonin >, caasp-beta at lists.suse.com > De: Le Bihan St?phane (AMUNDI-ITS) > Hi Paul, The result of command snapper ls. # snapper ls Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+---------------------------------+------+---------+-----------------------+--------- single | 0 | | | root | | current | single | 1 | | Fri 06 Oct 2017 08:47:14 AM UTC | root | | first root filesystem | I delete quota on /var/lb/etcd, and test balance but it?s not ok. I recreate quota and rescan and value is same before deletion. For information I launch du ?sh on / and result is 7.8Go. # du -sh / du: cannot access '/proc/7982/task/7982/fd/4': No such file or directory du: cannot access '/proc/7982/task/7982/fdinfo/4': No such file or directory du: cannot access '/proc/7982/fd/3': No such file or directory du: cannot access '/proc/7982/fdinfo/3': No such file or directory 7.8G / Regards, [cid:image001.gif at 01D35894.8DD70740] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Paul Gonin [mailto:paul.gonin at suse.com] Envoy? : jeudi 2 novembre 2017 10:55 ? : Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com Objet : Re: [caasp-beta] BTRFS space and quota Hi Stephane, What is the output of # snapper ls ? I assume that since you there were no updates yet it should look like Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+--------------------------+------+---------+-----------------------+-------------- single | 0 | | | root | | current | single | 1 | | Tue Oct 31 09:07:13 2017 | root | | first root filesystem | single | 2 | | Tue Oct 31 09:10:42 2017 | root | number | after installation | important=yes rgds Paul Le mardi 31 octobre 2017 ? 13:38 +0000, Le Bihan St?phane (AMUNDI-ITS) a ?crit : Hi Paul, We work with CaaSP2. Regards, [cid:image001.gif at 01D35894.8DD70740] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com De : Paul Gonin [mailto:paul.gonin at suse.com] Envoy? : mardi 31 octobre 2017 14:34 ? : Le Bihan St?phane (AMUNDI-ITS); caasp-beta at lists.suse.com Objet : Re: [caasp-beta] BTRFS space and quota Hi St?phane, Not that it should make a difference for the issue described, what version of CaaSP the cluster is running ? Is it CaaSP2 ? RC1 ? thanks Paul Le mardi 31 octobre 2017 ? 08:35 +0000, Le Bihan St?phane (AMUNDI-ITS) a ?crit : Hello, We have a strange case on CAASP plateform with btrfs quota. For history, I was out of office since 3 weeks, but others colleague test kubernetes plateform. When I return, we ask me because FS is full on master and worker nodes. I don?t have cause, but I think with a bad config, subvolume /var/lib/etcd grown and after correction reduce, though quota reserved all space. When I check, I see btrfs usage and it?s really full, but balance as no effect. After search I see quota is activate, and subvolumes /var/lib/etcd reserved 90% of space. But I don?t succeed to release this space. Can you help me for release space disk ? ? On master : # btrfs filesystem usage / Overall: Device size: 30.00GiB Device allocated: 29.99GiB Device unallocated: 17.00MiB Device missing: 0.00B Used: 27.56GiB Free (estimated): 504.93MiB (min: 496.43MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:27.97GiB, Used:27.49GiB /dev/vda6 27.97GiB Metadata,DUP: Size:1.00GiB, Used:32.64MiB /dev/vda6 2.00GiB System,DUP: Size:9.50MiB, Used:16.00KiB /dev/vda6 19.00MiB Unallocated: /dev/vda6 17.00MiB # btrfs fi df / Data, single: total=27.97GiB, used=27.50GiB System, DUP: total=9.50MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=32.66MiB GlobalReserve, single: total=16.00MiB, used=0.00B # btrfs fi show / Label: none uuid: 1b0614eb-fc59-4841-bbc5-5318087f6432 Total devices 1 FS bytes used 27.53GiB devid 1 size 30.00GiB used 29.99GiB path /dev/vda6 # btrfs subvolume list / ID 257 gen 40 top level 5 path @ ID 258 gen 194820 top level 257 path @/.snapshots ID 259 gen 197128 top level 258 path @/.snapshots/1/snapshot ID 260 gen 194810 top level 257 path @/boot/grub2/i386-pc ID 261 gen 194810 top level 257 path @/boot/grub2/x86_64-efi ID 262 gen 194810 top level 257 path @/cloud-init-config ID 263 gen 194810 top level 257 path @/home ID 264 gen 197081 top level 257 path @/root ID 265 gen 197111 top level 257 path @/tmp ID 266 gen 194809 top level 257 path @/var/cache ID 267 gen 194809 top level 257 path @/var/crash ID 268 gen 195783 top level 257 path @/var/lib/ca-certificates ID 269 gen 195783 top level 257 path @/var/lib/cloud ID 270 gen 24 top level 257 path @/var/lib/docker ID 271 gen 194810 top level 257 path @/var/lib/dockershim ID 272 gen 195719 top level 257 path @/var/lib/etcd ID 273 gen 194810 top level 257 path @/var/lib/kubelet ID 274 gen 194810 top level 257 path @/var/lib/machines ID 275 gen 196430 top level 257 path @/var/lib/misc ID 276 gen 194810 top level 257 path @/var/lib/mysql ID 277 gen 194810 top level 257 path @/var/lib/nfs ID 278 gen 194810 top level 257 path @/var/lib/ntp ID 279 gen 196428 top level 257 path @/var/lib/overlay ID 280 gen 194810 top level 257 path @/var/lib/rollback ID 281 gen 196427 top level 257 path @/var/lib/systemd ID 282 gen 194810 top level 257 path @/var/lib/vmware ID 283 gen 194810 top level 257 path @/var/lib/wicked ID 284 gen 197128 top level 257 path @/var/log ID 285 gen 197111 top level 257 path @/var/spool ID 286 gen 196428 top level 257 path @/var/tmp # btrfs qgroup show -pcreFf /var/lib/etcd qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/272 25.14GiB 25.14GiB none none --- --- # du -sh /var/lib/etcd/ 417M /var/lib/etcd/ ? On one worker # btrfs fi usage / Overall: Device size: 30.00GiB Device allocated: 30.00GiB Device unallocated: 1.00MiB Device missing: 0.00B Used: 27.94GiB Free (estimated): 135.28MiB (min: 135.28MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:27.99GiB, Used:27.86GiB /dev/vda6 27.99GiB Metadata,DUP: Size:1.00GiB, Used:43.44MiB /dev/vda6 2.00GiB System,DUP: Size:8.00MiB, Used:16.00KiB /dev/vda6 16.00MiB Unallocated: /dev/vda6 1.00MiB # btrfs fi df / Data, single: total=27.99GiB, used=27.86GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=43.44MiB GlobalReserve, single: total=16.00MiB, used=0.00B # btrfs fi show / Label: none uuid: 1d7b76f8-f91c-47e0-8be2-a3f02f90ac96 Total devices 1 FS bytes used 27.90GiB devid 1 size 30.00GiB used 30.00GiB path /dev/vda6 # btrfs qgroup show -pcreFf /var/lib/etcd qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/272 20.99GiB 20.99GiB none none --- --- # du -sh /var/lib/etcd/ 452M /var/lib/etcd/ Regards, [cid:image001.gif at 01D35894.8DD70740] St?phane Le Bihan SDE/DSI/IPR/SSD/UNX 90, Boulevard Pasteur - 75015 Paris Web: http://www.amundi.com T?l: +33 1 76 32 32 08 Equipe Unix : +33 1 76 32 02 30 @: stephane.lebihan at amundi.com @ : sits.unix at amundi.com _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2430 bytes Desc: image001.gif URL: From S.M.Flood at uis.cam.ac.uk Fri Nov 17 09:03:39 2017 From: S.M.Flood at uis.cam.ac.uk (Simon Flood) Date: Fri, 17 Nov 2017 16:03:39 +0000 Subject: [caasp-beta] SUSE CaaS Platform 2 download missing VMware files Message-ID: <3298bcd5-d32c-a3c4-e54f-7f39f4adf51c@uis.cam.ac.uk> I see that SUSE CaaS Platform 2 is now available for download except it's missing the VMware .vmx and .vmdk files which were available with RC1. Having not seen a comment on this list is this deliberate or are these AWOL? Thanks, Simon -- Simon Flood HPC System Administrator University of Cambridge Information Services United Kingdom SUSE Knowledge Partner From rushi.ns at sap.com Fri Nov 17 12:43:56 2017 From: rushi.ns at sap.com (Ns, Rushi) Date: Fri, 17 Nov 2017 19:43:56 +0000 Subject: [caasp-beta] SUSE CaaS Platform 2 download missing VMware files In-Reply-To: <3298bcd5-d32c-a3c4-e54f-7f39f4adf51c@uis.cam.ac.uk> References: <3298bcd5-d32c-a3c4-e54f-7f39f4adf51c@uis.cam.ac.uk> Message-ID: I built the cluster with 2.0 already but was getting some errors with dex authentication when try to download kubeconfig file from velum webinterface. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. -------- I got this error ?internal server error , Login error? ------ Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 11/17/17, 8:04 AM, "caasp-beta-bounces at lists.suse.com on behalf of Simon Flood" wrote: I see that SUSE CaaS Platform 2 is now available for download except it's missing the VMware .vmx and .vmdk files which were available with RC1. Having not seen a comment on this list is this deliberate or are these AWOL? Thanks, Simon -- Simon Flood HPC System Administrator University of Cambridge Information Services United Kingdom SUSE Knowledge Partner _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta From rushi.ns at sap.com Mon Nov 20 16:13:39 2017 From: rushi.ns at sap.com (Ns, Rushi) Date: Mon, 20 Nov 2017 23:13:39 +0000 Subject: [caasp-beta] kubeconfig download error with DEX internal server error , Login error Message-ID: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> Hello Team, I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From Martin.Weiss at suse.com Tue Nov 21 04:11:32 2017 From: Martin.Weiss at suse.com (Martin Weiss) Date: Tue, 21 Nov 2017 04:11:32 -0700 Subject: [caasp-beta] Antw: kubeconfig download error with DEX internal server error , Login error In-Reply-To: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> References: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> Message-ID: <5A1409E40200001C0030593B@prv-mh.provo.novell.com> Hi Rushi, did you specify a specific external FQDN for the API? Could you check if you have a similar strange entry in the /etc/hosts file on the admin with 127.0.0.1 api ... ? --> this was blocking my velum to contact the API on a master and due to that I could not download the kube-config.. Martin Hello Team, I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.decanha-knight at suse.com Tue Nov 21 06:48:00 2017 From: rob.decanha-knight at suse.com (Rob de Canha-Knight) Date: Tue, 21 Nov 2017 13:48:00 +0000 Subject: [caasp-beta] kubeconfig download error with DEX internal server error , Login error In-Reply-To: <090536B7-4B1A-4425-89C0-B24F589F7606@sap.com> References: <090536B7-4B1A-4425-89C0-B24F589F7606@sap.com> Message-ID: <90BF4755-A3AB-4F6D-9463-5AFEC7F5DF65@suse.com> ?Rushi, You emailed the list about this issue yes I can see that. However, investigating these things takes time. The engineering team need time to investigate it. Please be patient. Vincent is unable to help you with any technical issues he is our beta program manager for all SUSE Beta Programs and will just forward the email to the list again. I can see Martin emailed back this morning with some potential steps to follow that may help. I have attached them here for your convenience. Please attempt them and report back to the caasp-beta at lists.suse.com email He also logged bug ID 1069175 for you with this issue. I have asked you on numerous occasions to log a bug report before and this is now there. If you have not done already please create a Bugzilla account with your rushi.ns at sap.com email so I can add you as a CC to the bug (which will get you updates whenever anyone else adds comments to the bug). If you have already logged a bug and I cannot find it then great; please email caasp-beta at lists.suse.com with the Bugzilla ID number and someone will take a look for you. As I have suggested to you directly before, Martin is asking you to check the value entered into the External FQDN field in Velum is the correct one for your cluster. I asked you to do the same the next time you built a cluster but never heard back and I think you emailed someone else on the mailing list directly. We ask for the bug reports as they go straight to engineering. Emailing myself, Vincent or Simon about the issue without including caasp-beta at lists.suse.com will not make you any progress as we all end up getting different versions of the story without any diagnostic history. If the process is not followed correctly then we end up in the situation we are in now; where various people are getting the same emails from you without the information requested and no bug report logged. Now the bug has been logged it will be investigated but unless you create an account on the SUSE Bugzilla you will not be able to see it. Once you?ve created an account please let the caasp-beta at lists.suse.com list know and we can add you to the bug Martin logged on your behalf and you can continue diagnostics there. Please do not email myself, Simon or Vincent directly again about this issue or remove the caasp-beta at lists.suse.com list from the CC as this makes the email thread very hard to follow and will make the whole process take longer. Emailing random SUSE employees about an issue with no history of the issue or any diagnostic information requested is only going to slow things down in the long run and make it harder for our engineers to help you. Now we have a bug logged for you someone soon will email you and the caasp-beta at lists.suse.com with something to try or asking for some diagnostic info. Please do provide it and leave the caasp-beta at lists.suse.com email on CC as this gives your email the widest possible audience and best opportunity for someone to help. Thank you for your patience, Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" Date: Monday, 20 November 2017 at 23:20 To: Rob de Canha-Knight , Vincent Moutoussamy Cc: Simon Briggs , Vincent Untz Subject: kubeconfig download error with DEX internal server error , Login error Hi Rob, I did to reach betalist email but I wasn?t getting any response. Now I am stuck with this DEX error . Can someone from your team can help. We are getting lot of requests to build with SUSE CAASP as you guys already certified with SAP VORA and this becames a show stopper to me with this error. https://www.suse.com/communities/blog/sap-vora-2-0-released-suse-caasp-1-0/ @Vincent Moutoussamy: Can you help here is the problem ======================= I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Tuesday, November 14, 2017 at 2:56 PM To: Rushi NS Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Rushi, As advised in the previous mail I?m unable to provide any additional support to you on this matter and I have to direct you to obtain support through the usual channels for any additional queries. So please reach out to the caasp-beta mailing list or use Bugzilla to log a bug for investigation if you think the process is being followed correctly as we have not seen this issue internally during 2.0 testing in any of our environments or other beta user environments so we would appreciate the bug report so it can be investigated and fixed by our engineering team if it is indeed a problem with the product. Please note though that due to our HackWeek that we run at SUSE this week you may experience a slightly delayed response to both the caasp-beta mailing list as well as anything put through Bugzilla as in effect our product and engineering teams are off this week. Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" Date: Tuesday, 14 November 2017 at 18:05 To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Did you get a chance to check my mail and any solution to this problem . Do you think this is bug in the release. Like I said I have tried multi master as well single mater, both iterations the errors result is same Do think if any proxy issues. As you know the systems are behind proxy and I use proxy parameters during the setup. Here is my screenshot of proxy settings. Let me know if anyway to fix this. I can share my screen if you have few mins. this is really killing my team as I need to setup a SUSE based kubernetes which I was trying to do with KUBEADM but I am still hoping CAASP will overcome the issues with KUBEADM alterative but its not going as per my expectations Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS Date: Friday, November 10, 2017 at 3:09 PM To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, I have tried with using Dashboard host as admin node as you mentioned (velum host) , after doing everything I got the same error. I think this could be problem with multi master. I did another test with single master and it has the same error. Not sure likely where this error but I did everything correct based on your suggestion. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS Date: Friday, November 10, 2017 at 11:59 AM To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Ok got it . Because of multi master I do require round robin which either the admin node or something with Laos balancer . Let me try this fix by rebuilt with multi master and if it fails then I will try with single master . Keep you posted. Have a nice weekend . Best Regards, Rushi. Success is not a matter of being the best & winning the race. Success is a matter of handling the worst & finishing the race Sent from my iPhone please excuse typos and brevity On Nov 10, 2017, at 11:30, Rob de Canha-Knight wrote: In the k8s external fqdn that must be a load balancer set up externally from the cluster if doing multi-master. The external dashboard fqdn must be the value of the fqdn that velum is running on the admin node. If your admin node is lvsusekub1 then put that in there. Doing multi-master on bare metal requires a loadbalancer and it?s that loadbalancer address that goes in the top box. If you don?t have a loadbalancer then you can put in any of the master node fqdns and it will work. So put lvsusekub3 in the top box and lvsusekub1 in the bottom box and you can do round robin DNS on your dns server. It?s worth noting that once you enter those values they are fixed and to change them you have to rebuild the cluster from scratch. If this is a development environment I recommend using a single master node and putting that value in the top box and the admin node fqdn in the bottom box. Start simple and build up from there. I?m signing off now for the weekend. Have a good weekend. ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" Date: Friday, 10 November 2017 at 19:24 To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Ok , I agree some point (two boxes only ..i put two boxes with same hostname ?lvsusekub3? and lvsusekube3.pal.sap.corp). I setup with 3 masters as I mentioned before and this host LVSUSEKUB3 is the first master node hostname . I did make sure everything right except FQDN Question>: what is second box I should put hostname My admin node: lvsusekub1 Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Friday, November 10, 2017 at 11:19 AM To: Rushi NS Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 I've identified your problem. The first box is the k8s API endpoint. This field has to be set to the kubernetes master fqdn. I think you have it set to your admin node fqdn and that?s why things are not working. You?ll have to destroy your cluster and make sure that the top field in your screenshot has the fqdn of the k8s master node not the admin node (those two boxes must have different addresses in) ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" Date: Friday, 10 November 2017 at 19:17 To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Answer to your queries. You must make sure that you are accessing velum from the right FQDN ? the one you gave velum during the setup process when it asks for the internal and external dashboard FQDN. I set this during API FQDN I did make sure no plugin blocks (java sript) Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Friday, November 10, 2017 at 11:13 AM To: Rushi NS Cc: Vincent Untz , Simon Briggs Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 You must make sure that you are accessing velum from the right FQDN ? the one you gave velum during the setup process when it asks for the internal and external dashboard FQDN. Aside from that make sure you?ve not got any browser plugins that are blocking scripts or javascript from running. If you still cannot get it to work then you will have to wait for the 2.0 final release next week and try that. If you run into issues there I cannot help as it doesn?t fall into my role and you?ll have to use the official channels for support. ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" Date: Friday, 10 November 2017 at 19:09 To: Rob de Canha-Knight Cc: Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Thanks. I did the setup with 3 master and 1 minions and its worked nicely but while downloading kubectl file the authentication I set during velum setup is not accepted and I get error downloading the kubectl file > Also I got the error you stated (not being able to talk to the velum API. When this happens please refresh your browser page and accept the new certificate.) I refresh but I didn?t get any where accept new certificate but all worked. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Friday, November 10, 2017 at 10:32 AM To: Rushi NS Cc: Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 It supports multi master and yes; your precious mail is correct. Sent from my iPhone - please excuse any shortness On 10 Nov 2017, at 18:29, Ns, Rushi wrote: Hi Rob, Is this release supports multi master (controllers ? etcd) or single master. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS Date: Friday, November 10, 2017 at 10:17 AM To: Rob de Canha-Knight Cc: Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Perfect and Thanks, I just downloaded and will start deploying and keep you posted. As I understand 2.0 is removed the caasp-cli authentication ? and everything should work as it was before with 1.0 using kubeconfig file downloaded from VELUM web. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Friday, November 10, 2017 at 10:01 AM To: Rushi NS Cc: Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 November 16th However; you can download our latest release candidate ISO from https://drive.google.com/file/d/1ZO0sduyV5GS3WThl0eLVjnMNHCaFIi5u/view?usp=sharing which doesn?t require you to use caasp-cli. One note; during the bootstrap process you will get an error at the top about not being able to talk to the velum API. When this happens please refresh your browser page and accept the new certificate. Once you have done this it will be able to talk to the API and you?re good to go. To obtain the kubeconfig file you click the button and this will redirect you to a new login page where you enter in your caas platform admin account credentials and it will offer your browser a download of the kubeconfig that has the correct client certificate in it. Many thanks, Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" Date: Friday, 10 November 2017 at 17:58 To: Rob de Canha-Knight Cc: Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 HI Rob, What is the ETA for 2.0 release ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Tuesday, November 7, 2017 at 2:32 PM To: Rushi NS Cc: Carsten Duch , Johannes Grassler , Michal Jura , Nicolas Bock , Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Thanks Rushi - yes sticking with CaaSP will make your life much easier and enable you to get support as well once a suitable support contract/agreement is in place. When 2.0 is released we will have an updated user manual and deployment guide in the usual place (https://www.suse.com/documentation/suse-caasp/index.html) for you to consume so don?t worry you won?t get in any trouble :) Rob Sent from my iPhone - please excuse any shortness On 7 Nov 2017, at 23:27, Ns, Rushi wrote: Hi Rob, Thank you. Yes, I am sticking to ?CAASP? only , since had issues with authorization I wanted to try out with kubeadm to setup a cluster for our DMZ internet facing for federation. KUBEADM is working but its pain as CAASP works nice with everything based on PXE which is what I would like to have in my future builds. If you say the 2.0 is coming out next, then I will wait . please provide the doucemntation how you consume 2.0 , so that I don?t get any trouble. Thank you so much for your quick reply. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 11/7/17, 2:22 PM, "Rob de Canha-Knight" wrote: Hi Rushi. As mentioned on the thread I just sent you; the method Simon is referring to there is the manual upstream way to deploy Kubernetes. It is separate and very different from CaaSP and is completely unsupported in every way. As such; we cannot help you here with the kubeadm way in any way shape or form. Please stick with CaaSP for now if you can or want assistance from us. The version that doesn?t require you to use caasp-cli will be released by the end of next week (2.0 final) and you will be able to deploy that successfully and if you run into any issues we can help you. As a side note I kindly request that you use the CaaSP-beta mailing list for your queries as you did in the past or log a support ticket when you run into issues with the final release. You are likely to get a better response faster than emailing our product team directly plus the knowledge will be archived publicly for everyone else to benefit. Many thanks, Rob Sent from my iPhone - please excuse any shortness On 7 Nov 2017, at 23:13, Ns, Rushi wrote: Hello Simon, How are you . Long time. I have some Question. Not sure if you can answer. As you know we are doing test of ?CAASP? from SUSE , however it is bit pain as CAASP-CLI authentication is boiling down the cluster without access. Rob is aware what I was talking. Since the CAASP is still issue with CAASP-CLI , I was thinking if SLES12 SP1 can work with KUBEADM method to install cluster. Did anyone tried from your side. I found this link but not sure https://forums.suse.com/archive/index.php/t-9637.html. Do you know who is ?simon (smflood)? is that you :( on the above link , he said he did install with KUBEADM using SLES 12 SP1 and SP2 where he has given images links to https://software.opensuse.org/download.html?project=Virtualization%3Acontainers&package=kubernetes can someone help me if KUBEADM method to insetall kubernetes cluster on SUSE 12 SP1/Sp2. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 3/14/17, 2:26 AM, "Simon Briggs" wrote: Hi Rushi, I am part of the team delivering our Expert Day in Rome today so cannot make a call, but I want to make sure things are progressing for you. Please advise if Michal's advise worked or if you have new challenges we can help with. Thanks Simon Briggs On 10/03/17 09:10, Simon Briggs wrote: Hi Rushi, AJ has answered the CaaSP question. Bit I can help explain that SOC7 is now fully GA and can be downloaded freely from the https://www.suse.com/download-linux/ Cloud click through. Thanks Simon On 09/03/17 21:54, Ns, Rushi wrote: Hi Michaal, Any update on this. I am eagerly waiting for the change as I wil start the setup again when SOC7 GA comes out. @Vincent: Do you know when SOC7 GA comes out . Also CaaS Beta ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/23/17, 7:14 AM, "Ns, Rushi" wrote: Hi Michal, Good to hear that it's doable, yes please test at your end and let me know. I will wait for your confirmation and procedure how to consume our designated SDN vlan. Best Regards, Rushi. Success is not a matter of being the best & winning the race. Success is a matter of handling the worst & finishing the race Sent from my iPhone please excuse typos and brevity On Feb 23, 2017, at 03:04, Michal Jura wrote: Hi Rushi, It should be possible to use VLAN ID 852 for Magnum private network. You should configure network with name private in advance with vlan ID 852, but I have to test it first. Changing subnet to 192.168.x.x should be durable too, but I have to check it. Please give me some time and I will come back to you. Best regards, Michal On 02/22/2017 11:01 PM, Ns, Rushi wrote: Hi Carsten,. Thank you. As you know we have VLAN ID *852* as SDN in network.json which is already in our switch level. Here I have question or suggestion. Can I use this VLAN 852 for Magnum side as L2 traffic ? we do not want to use 10.x.x.x IP space, so we use non-routable 192.168.x.x kind of IP space which will route through our 852 VLAN . Is it possible to define this in Heat Template, so that cluster deployment will generate 192.168.x.x subnet instead of 10.x.x.x subnet when a kubernetes cluster created? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE *From: *Carsten Duch *Date: *Wednesday, February 22, 2017 at 10:21 AM *To: *"Ns, Rushi" , Johannes Grassler , Michal Jura , Vincent Untz *Cc: *Nicolas Bock , Simon Briggs *Subject: *AW: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, Theater Problem is that you have configured it to use the vlans from 222 to 2222. You have to choose a range which is allowed on the Trunk port and not already in use. If you want to change the starting Point you have to redeploy the whole cloud and provide the correct vlan id when editing the network.json. So without that, you are only able to change the max number up to a value you are able to use. Maybe 50 for 222 to 272. Or try vxlan instead of vlan again. But I think that the overall problem is a misconfigured switch. Make sure that all vlan ids are allowed for the Trunk and you will have a good chance that it works. Von meinem Samsung Galaxy Smartphone gesendet. -------- Urspr?ngliche Nachricht -------- Von: "Ns, Rushi" Datum: 22.02.17 19:04 (GMT+01:00) An: Carsten Duch , Johannes Grassler , Michal Jura , Vincent Untz Cc: Nicolas Bock , Simon Briggs Betreff: Re: Weekly review of SAP Big Data SOC 7 testing HI Carsten Yes I am aware as we discussed this during our call and after reading your response, however the vlan 222-322 is already used in our production particularly 271 is our Laptop VLAN (All employees get IP address of the Laptops ) which we cannot use it for this. I am looking for alternatives. Let me know if you have any idea other than this 222-322 allow ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/21/17, 10:38 PM, "Carsten Duch" wrote: Hi Rushi, have you tried to configure your switch according to my email from 14th? Maybe you didn't got the mail? I suggested the following configuration on the switch: Your are using linuxbridge with vlan. Make sure to allow tagging of VLANs on the switch and add the range to the allowed VLANs for the TRUNK. The range is defined by your fixed vlan and the maximum number of VLANs. starting point: fixed VLAN id = 222 + Maximum Number of VLANs configured in the Neutron Barclamp= 2000 That means that you have to allow a range from 222 to 2222 on your switch side. But I would recommend to reduce the Maximum so that it will not overlap with other existing VLANs. You can reduce it to 100 or something lower and then allow a range from 222 to 322 for the TRUNK Port. You don't need to create all the VLANs manually but you need to allow VLAN tagging for the Port and allow a range. Depending on your switch,the configuration should look something like: switchport trunk allow vlan 222-322 http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus5000/sw/configuration/guide/cli/CLIConfigurationGuide/AccessTrunk.html Make sure to allow all the VLANs from your network.json for the TRUNK Port. On 21.02.2017 23:40, Ns, Rushi wrote: Hi Michal, Yes, that?s obviously the root cause I found before your email but it is cumbersome to understand the flow of the segmentation ID which I need to discuss how we can overcome. What I observe is, every time I create new cluster the private network generates a new segment ID:: 271, 272, 273 like that?(this is like VLAN) which our floating VLAN should be able to reach only when we add this segment ID (dummy ID 231,232 or whatever generates) to our swith level as real VLAN otherwise the private network subnet cannot reach to floating IP . check attached picture contains the information of segmentation ID: I remember I had one session with one of your SuSE person (carsten.duch at suse.com) recently I shared my system screen and we discussed this network segment issue (Software Defined Networ) and he answered some of that , however it appeared its beyond is knowledge. I have CC?d Carsten here., so you can talk to him. Do you have any idea what needs to be done on the physical network swtich level where the VLANs already connected but not this VLAN (271, 272,whatever) because this is really not easy to allow in real network switch configuration of the VLAN to allow this trunked port which doesn?t exist at all. We had the same issue before in deploying cloud foundry on top of openstack and we fool the switch with the private segment ID created and at the end we found this is a bug in openstack SDN side. Let me know what needs to be done and I can do that. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/21/17, 4:27 AM, "Michal Jura" wrote: Hi, This problem looks like there is no connection from private network where kube-master and kube-mionions are launched to Heat PublicURL endpoint. Please fix network configuration. On 02/20/2017 08:10 PM, Johannes Grassler wrote: Hello Rushi, alright, so we are creating a cluster now but the Kubernetes master fails to signal success to the Heat API (that's what WaitConditionTimeout means). Unfortunately this is where debugging becomes fairly hard...can you ssh to the cluster's Kubernetes master and get me /var/log/cloud-init.log and /var/log/cloud-init-output.log please? Maybe we are lucky and find the cause of the problem in these logs. If there's nothing useful in there I'll probably have to come up with some debugging instrumentation next... Cheers, Johannes On 02/20/2017 07:53 PM, Ns, Rushi wrote: Hi Johannes, Thanks, I just tried with the changes you mentioned and I see that it made some progress this time (creating private network subnet, heat stack and instance as well cluster ) , however after some time it failed with ? CREATE_FAILED? status. Here is the log incase if you want to dig in more. ================ 2017-02-20 09:01:18.148 92552 INFO oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] Reconnected to AMQP server on 10.48.220.40:5672 via [amqp] clientwith port 36265. 2017-02-20 10:36:25.914 92552 INFO magnum.conductor.handlers.cluster_conductor [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] The stack None was not found during cluster deletion. 2017-02-20 10:36:26.515 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] Deleting certificate e426103d-0ecf-4044-9383-63305c667a c2 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 10:36:26.517 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] Deleting certificate a9a20d33-7b54-4393-8385-85c4900a0f 79 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 10:37:39.905 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-d50c84af-7eca-4f76-8e2b-dc49933d0376 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 10:37:40.049 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-d50c84af-7eca-4f76-8e2b-dc49933d0376 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 10:48:48.172 92552 ERROR magnum.conductor.handlers.cluster_conductor [req-ac20eb45-8ba9-4b73-a771-326122e94ad7 522958fb-fd7c-4c33-84d2-1ae9e60c1574 - - - -] Cluster error, stack status: CREATE_ FAILED, stack_id: e47d528d-f0e7-4a40-a0d3-12501cf5a984, reason: Resource CREATE failed: WaitConditionTimeout: resources.kube_masters.resources[0].resources.master_wait_condition: 0 of 1 received 2017-02-20 10:48:48.510 92552 INFO magnum.service.periodic [req-ac20eb45-8ba9-4b73-a771-326122e94ad7 522958fb-fd7c-4c33-84d2-1ae9e60c1574 - - - -] Sync up cluster with id 15 from CREATE_IN_PROGRESS to CRE ATE_FAILED. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 9:51 AM, "Johannes Grassler" wrote: Hello Rushi, I took a closer look at the SUSE driver and `--discovery-url none` will definitely take care of any etcd problems. The thing I'm not quite so sure about is the registry bit. Can you please try the following... magnum cluster-template-create --name k8s_template\ --image-id sles-openstack-magnum-kubernetes \ --keypair-id default \ --external-network-id floating \ --dns-nameserver 8.8.8.8 \ --flavor-id m1.magnum \ --master-flavor-id m1.magnum \ --docker-volume-size 5 \ --network-driver flannel \ --coe kubernetes \ --floating-ip-enabled \ --tls-disabled \ --http-proxy http://proxy.pal.sap.corp:8080 magnum cluster-create --name k8s_cluster \ --cluster-template k8s_template \ --master-count 1 \ --node-count 2 \ --discovery-url none ...and see if that yields a working cluster for you? It still won't work in a completely disconnected environment, but with the proxy you have in place it should work. Some explanation: the --discovery-url none will disable the validation check that causes the GetDiscoveryUrlFailed error, allowing Magnum to instantiate the Heat template making up the cluster. The --http-proxy http://proxy.pal.sap.corp:8080 will then cause the cluster to try and access the Docker registry through the proxy. As far as I understand our driver, the --registry-enabled --labels registry_url=URL will require you to set up a local docker registry in a network reachable from the Magnum bay's instances and specify a URL pointing to that docker registry. I'd rather not ask you to do that if access through the proxy turns out to work. Cheers, Johannes On 02/20/2017 04:23 PM, Ns, Rushi wrote: Hi Johannes, I have also added https_proxy parameter thought it might need both (http and https) but even that failed too. I see the log expected to have discovery etcd. magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --http-proxy http://proxy.pal.sap.corp:8080 --https-proxy http://proxy.pal.sap.corp:8080 magnum-conductor.log ===================== 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 7:16 AM, "Ns, Rushi" wrote: Hello Johannes, No luck even after adding the internet proxy at the time of cluster template creation and without specify anything at the cluster-create . The cluster create failed and this time I don?t see anything like , no heat stack created, no private kubernetes network subnet created and many. Here are the commands I tried. Let me know if this is how supposed to be used or am I doing something wrong. magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --http-proxy http://proxy.pal.sap.corp:8080 magnum cluster-create --name k8s_cluster --cluster-template k8s_template --master-count 1 --node-count 2 this is the magnum-conductor.log I see something more needed . 2017-02-20 06:55:27.245 92552 ERROR magnum.drivers.common.template_def [-] HTTPSConnectionPool(host='discovery.etcd.io', port=443): Max retries exceeded with url: /new?size=1 (Caused by NewConnectionError (': Failed to establish a new connection: [Errno 113] EHOSTUNREACH',)) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server [-] Exception during message handling 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 165, in cluster_create 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server create_timeout) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 97, in _create_stack 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server _extract_template_definition(context, cluster)) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 82, in _extract_template_definition 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server scale_manager=scale_manager) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 337, in extract_definition 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server self.get_params(context, cluster_template, cluster, **kwargs), 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/template_def.py", line 50, in get_params 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server extra_params['discovery_url'] = self.get_discovery_url(cluster) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server [-] Can not acknowledge message. Skip processing 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 126, in _process_incoming 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server message.acknowledge() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 119, in acknowledge 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.message.acknowledge() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 251, in acknowledge 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self._raw_message.ack() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.channel.basic_ack(self.delivery_tag) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self._send_method((60, 80), args) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.channel_id, method_sig, args, content, 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server write_frame(1, channel, payload) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 188, in write_frame 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server frame_type, channel, size, payload, 0xce, 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 385, in sendall 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server tail = self.send(data, flags) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 379, in send 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server return self._send_loop(self.fd.send, data, flags) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 366, in _send_loop 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server return send_method(data, *args) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server error: [Errno 104] Connection reset by peer 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server 2017-02-20 06:55:27.310 92552 ERROR oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] AMQP server on 10.48.220.40:5672 is unreachable: . Trying again in 1 seconds. Client port: 50462 2017-02-20 06:55:28.347 92552 INFO oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] Reconnected to AMQP server on 10.48.220.40:5672 via [amqp] clientwith port 58264. 2017-02-20 06:59:09.827 92552 INFO magnum.conductor.handlers.cluster_conductor [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] The stack None was not found during cluster deletion. 2017-02-20 06:59:10.400 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] Deleting certificate 105d39e9-ca2a-497c-b951-df87df2a02 24 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 06:59:10.402 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] Deleting certificate f0004b69-3634-4af9-9fec-d3fdba074f 4c from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 07:02:37.658 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 07:02:37.819 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 07:02:40.026 92552 ERROR magnum.drivers.common.template_def [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] HTTPSConnectionPool(host='discovery.etcd.io', port=443): Max retries exceeded with url: /new?size=1 (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 113] EH OSTUNREACH',)) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Exception during message handling 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 165, in cluster_create 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server create_timeout) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 97, in _create_stack 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server _extract_template_definition(context, cluster)) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 82, in _extract_template_definition 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server scale_manager=scale_manager) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 337, in extract_definition 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server self.get_params(context, cluster_template, cluster, **kwargs), 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/template_def.py", line 50, in get_params 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server extra_params['discovery_url'] = self.get_discovery_url(cluster) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 12:41 AM, "Johannes Grassler" wrote: Hello Rushi, On 02/20/2017 12:26 AM, Ns, Rushi wrote: Hi Johannes/Vincent Thank you to both for the detailed. I did those steps as per the link https://www.suse.com/documentation/suse-openstack-cloud-7/book_cloud_suppl/data/sec_deploy_kubernetes_without.html you provided before executing the cluster as I learned this in the document , however I am sure I did something wrong as ii don?t know what public etcd discovery url since I don?t have anything setup on my end. Here are the command I used and if you see I specified that parameter as you suggested but only as ?URL? without knowing the real value of ?URL? (--labels registry_url=URL) , so this is my mistake or how it should be used ? I am not sure, but I followed your document ? ---------------------------------- 1) magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --registry-enabled --labels insecure_registry_url=URL 2) magnum cluster-create --name k8s_cluster --cluster-template k8s_template --master-count 1 --node-count 2 --discovery-url none ----------------------------------- Now I would like to understand where and how I can setup my own local etcd discovery service ? is it required. As far as I know etcd it is. I may be wrong though. Luckily there is another solution: Also our internet access is through proxy port (http://proxy.pal.sap.corp:8080) so if you can guide how to do that setup, I can do or tell me the URL value to specified and I can try. Just add an `--http-proxy http://proxy.pal.sap.corp:8080` <%20%20> when creating the cluster template and do NOT provide any discovery URL options for either the cluster template or the cluster itself. Provided the proxy doesn't require authentication this should do the trick... Cheers, Johannes Also I wanted to inform that, we had issue Horizon (public and admin page IP is not hand shake) with BETA 8 Neutron going with VLAN open switch, Nicolas and I had some sessions towards and Nicolas suggested to use ?LinuxBridge instead openvswith? since the patch he has may not be in the BETA8 that I download. . you can check with Nicolas on this as our current BEtA8 seems not good with VLAN/openvswitch. At any cost, I will remove this cluster and rebuild it soon but I wail wait until the full GA build comes out instead of BETA 8 or I can try if you can think the latest BETA 8 will not have issues overall. Please suggest and provide me the help for above value ?labels insecure_registry_url=URL? or how to setup local etc discovery service ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/17/17, 1:14 AM, "Vincent Untz" wrote: Hi, Le vendredi 17 f?vrier 2017, ? 10:02 +0100, Johannes Grassler a ?crit : Hello Rushi, sorry, this took me a while to figure out. This is not the issue I initially thought it was. Rather it appears to be related to your local networking setup and/or the cluster template you used. This is the crucial log excerpt: | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. Magnum uses etcd to orchestrate its clusters' instances. To that end it requires a discovery URL where cluster members announce their presence. By default Magnum uses the public etcd discovery URL https://discovery.etcd.io/new?size=%(size)d This will not work in an environment without Internet access which I presume yours is. The solution to this problem is to set up a local etcd discovery service and configure its URL template in magnum.conf: [cluster] etcd_discovery_service_endpoint_format = https://my.discovery.service.local/new?size=%(size)d Ah, this use case is in our doc. Rushi, can you follow what's documented at: https://www.suse.com/documentation/suse-openstack-cloud-7/book_cloud_suppl/data/sec_deploy_kubernetes_without.html Vincent Cheers, Johannes On 02/16/2017 05:03 AM, Ns, Rushi wrote: Hi Simon. Some reason the mail I sent this morning didn?t go, also did?t bounced back but I found it was stuck in my drafts. Anyways, sorry about the delay . Here you go again. Please find attached files of magnum as requested. Please find below output of other commands result. ------ root at d38-ea-a7-93-e6-64:/var/log # openstack user list +----------------------------------+---------------------+ | ID | Name | +----------------------------------+---------------------+ | d6a6e5c279734387ae2458ee361122eb | admin | | 7cd6e90b024e4775a772449f3aa135d9 | crowbar | | ea68b8bd8e0e4ac3a5f89a4e464b6054 | glance | | c051a197ba644a25b85e9f41064941f6 | cinder | | 374f9b824b9d43d5a7d2cf37505048f0 | neutron | | 062175d609ec428e876ee8f6e0f39ad3 | nova | | f6700a7f9d794819ab8fa9a07997c945 | heat | | dd22c62394754d95a8feccd44c1e2857 | heat_domain_admin | | 9822f3570b004cdca8b360c2f6d4e07b | aodh | | ac06fd30044e427793f7001c72f92096 | ceilometer | | d694b84921b04f168445ee8fcb9432b7 | magnum_domain_admin | | bf8783f04b7a49e2adee33f792ae1cfb | magnum | | 2289a8f179f546239fe337b5d5df48c9 | sahara | | 369724973150486ba1d7da619da2d879 | barbican | | 71dcd06b2e464491ad1cfb3f249a2625 | manila | | e33a098e55c941e7a568305458e2f8fa | trove | +----------------------------------+---------------------+ root at d38-ea-a7-93-e6-64:/var/log # openstack domain list +----------------------------------+---------+---------+-------------------------------------------+ | ID | Name | Enabled | Description | +----------------------------------+---------+---------+-------------------------------------------+ | default | Default | True | The default domain | | f916a54a4c0b4a96954bad9f9b797cf3 | heat | True | Owns users and projects created by heat | | 51557fee0408442f8aacc86e9f8140c6 | magnum | True | Owns users and projects created by magnum | +----------------------------------+---------+---------+-------------------------------------------+ root at d38-ea-a7-93-e6-64:/var/log # openstack role assignment list +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ | Role | User | Group | Project | Domain | Inherited | +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ | 6c56316ecd36417184629f78fde5694c | d6a6e5c279734387ae2458ee361122eb | | 6d704aa281874622b02a4e24954ede18 | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 7cd6e90b024e4775a772449f3aa135d9 | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 6c56316ecd36417184629f78fde5694c | d6a6e5c279734387ae2458ee361122eb | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 932db80652074571ba1b98738c5af598 | 7cd6e90b024e4775a772449f3aa135d9 | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | ea68b8bd8e0e4ac3a5f89a4e464b6054 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | ea68b8bd8e0e4ac3a5f89a4e464b6054 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | c051a197ba644a25b85e9f41064941f6 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | c051a197ba644a25b85e9f41064941f6 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 374f9b824b9d43d5a7d2cf37505048f0 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 374f9b824b9d43d5a7d2cf37505048f0 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 062175d609ec428e876ee8f6e0f39ad3 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 062175d609ec428e876ee8f6e0f39ad3 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | f6700a7f9d794819ab8fa9a07997c945 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | f6700a7f9d794819ab8fa9a07997c945 | | 19c2c03e858b47da83eda020aa83639e | | False | | 932db80652074571ba1b98738c5af598 | d6a6e5c279734387ae2458ee361122eb | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 6c56316ecd36417184629f78fde5694c | dd22c62394754d95a8feccd44c1e2857 | | | f916a54a4c0b4a96954bad9f9b797cf3 | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 9822f3570b004cdca8b360c2f6d4e07b | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 9822f3570b004cdca8b360c2f6d4e07b | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | ac06fd30044e427793f7001c72f92096 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | ac06fd30044e427793f7001c72f92096 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | d694b84921b04f168445ee8fcb9432b7 | | | 51557fee0408442f8aacc86e9f8140c6 | False | | 9fe2ff9ee4384b1894a90878d3e92bab | bf8783f04b7a49e2adee33f792ae1cfb | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | bf8783f04b7a49e2adee33f792ae1cfb | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 2289a8f179f546239fe337b5d5df48c9 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 2289a8f179f546239fe337b5d5df48c9 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 369724973150486ba1d7da619da2d879 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 369724973150486ba1d7da619da2d879 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 71dcd06b2e464491ad1cfb3f249a2625 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 71dcd06b2e464491ad1cfb3f249a2625 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | e33a098e55c941e7a568305458e2f8fa | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | e33a098e55c941e7a568305458e2f8fa | | 19c2c03e858b47da83eda020aa83639e | | False | +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/15/17, 11:01 AM, "Ns, Rushi" wrote: Hi Simon, I am sorry, I got stuck. Sure I will send the logs now . Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/15/17, 10:26 AM, "Simon Briggs" wrote: Hi Rushi, I assume you where unable to join our call. Would it be possible to collect the logs that we request, as this is the only way my teams can help you remotely. Regards Simon Briggs On 15/02/17 08:58, Johannes Grassler wrote: Hello Rushi, ok. Can you please supply 1) A supportconfig tarball: this will have the contents of both /etc/magnum/magnum.conf.d/ and magnum-conductor.log which should allow me to figure out what is wrong. 2) The output of `openstack user list`, `openstack domain list`, `openstack role assignment list` (all run as the admin user). With that information I should be able to figure out whether your problem is the one I mentioned earlier. Cheers, Johannes On 02/14/2017 04:42 PM, Ns, Rushi wrote: Hello Johannes, Thank you for the information. FYI, my setup is not on HA . Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/14/17, 12:43 AM, "Johannes Grassler" wrote: Hello Rushi, if the problem is the | Creating cluster failed for the following reason(s): Failed to create trust Error ID: c7a27e1f-6a6a-452e-8d29-a38dbaa3fd78, Failed to create trust Error ID: a9f328cc-05e8-4c87-9876-7db5365812f2 error you mentioned below, the problem is likely to be with Magnum rather than with Heat. Magnum creates a Keystone trust for each cluster that the cluster's VMs use to talk to the Magnum API among others. We had a spell of trouble[0] with that recently and you may be running into the same problem, especially if you are running an HA setup. Are you? If so, check if all files in /etc/magnum/magnum.conf.d/ match across all controller nodes. If there are differences, especially in the [trust] section you are probably affected by the same issue we ran into recently. Cheers, Johannes [0] https://github.com/crowbar/crowbar-openstack/pull/843 On 02/14/2017 09:01 AM, Simon Briggs wrote: Hi Rushi, You advise that you still have an issue. Would this still be the same as the one that Vincent helped with below? I have added Johannes to those CC'd as he is skilled in debugging that type of error. Thanks Simon Sent from my Samsung device -------- Original message -------- From: Vincent Untz Date: 06/02/2017 12:39 (GMT+02:00) To: Rushi Ns Cc: Michal Jura , Nicolas Bock , Simon Briggs Subject: Re: Weekly review of SAP Big Data SOC 7 testing Rushi, About the "Failed to create trust": can you check the heat logs? My guess is that the error comes from there and more context about what's happening around that error would probably be useful. Thanks, Vincent Le lundi 06 f?vrier 2017, ? 04:01 +0000, Ns, Rushi a ?crit : Hi Simon Thank you. Please try if Michal can give some information about the image of kubernetes and how to consume. To me I have full knowledge of kubernetes since from long time also we are in production kubernetes in Germany for many projects which I did. Anyways, please try to get Michal for 1 or 2 hours discussion so that I get idea also please help to find the image from the link provided is not available at this time. http://download.suse.de/ibs/Devel:/Docker:/Images:/SLE12SP2-JeOS-k8s-magnum/images/sles-openstack-magnum-kubernetes.x86_64.qcow2 @Michal: Would you be kind to help me to get the Kuberentes image as bove link is not working Regards to SAHARA, I made progress of upload image (mirantis prepared images of SAHARA Hadoop) and created the necessary configuration (cluster templates, node templates and everything) and at the final creating a cluster from template erord with the following. , so I really need someone from your team having SAHARA knowledge would help to get the issue fixed. here is the error while creating cluster. Creating cluster failed for the following reason(s): Failed to create trust Error ID: c7a27e1f-6a6a-452e-8d29-a38dbaa3fd78, Failed to create trust Error ID: a9f328cc-05e8-4c87-9876-7db5365812f2 [cid:image001.png at 01D27FEA.9C6BC8F0] Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs Date: Saturday, February 4, 2017 at 1:57 AM To: "Ns, Rushi" Cc: Michal Jura , Nicolas Bock , Vincent Untz Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, Thanks for the update and I'm glad we are moving forward. We'll done everyone. Michal is indeed an expert around these services, though I am aware he is presently on a sprint team mid cycle so he may find it difficult to do his required workload and deal with external work as well. So please be patient if it takes a small amount of time for him to respond Thanks Simon Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" Date: 04/02/2017 02:19 (GMT+00:00) To: Simon Briggs Cc: Michal Jura , Nicolas Bock Subject: Re: Weekly review of SAP Big Data SOC 7 testing HI Simon, Just to give you update. The Horizon issue was resolved changing the Nuetron from OPENVSWITCH to LinuxBridge as mentioned by Nick. Now I need to move forward for SAHARA which I can try, but if I run into issues, I might need some expertise who will be having SAHARA knowledge from your team. Regards to other request Magnum (kubernetes) I would like to discuss with Michal Jura (mjura at suse.com), I have Cc?d here as I was going through his github document https://github.com/mjura/kubernetes-demo but wasn?t able to find the image as he specified Link to the image http://download.suse.de/ibs/Devel:/Docker:/Images:/SLE12SP2-JeOS-k8s-magnum/images/sles-openstack-magnum-kubernetes.x86_64.qcow2 Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs Date: Friday, February 3, 2017 at 10:38 AM To: "Ns, Rushi" Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi, Sorry about delaying you. I will coordinate with Nick to get the best resource for you. Thanks Simon Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" Date: 03/02/2017 18:33 (GMT+00:00) To: Simon Briggs Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi Simon, Thank you, I waited on the call, however the toll free number is not US number which call never went through(the Toll free seems UK ), but I stayed on GOtoMEETiNG for 15 mins and disconnected. Sure, I will sync up with Nick and yes you are right it seems not aa code issue, however we are not sure which I will check with Nick in about 1 hour . Keep you posted. Also I need help on Magnum (kubernetes side as well) I see a person Michal Jura (mjura at suse.com) I spoke with Nick to bring Michal on another call to start the Magnum stuff. Can you try to arrange Michal to be with me next week for a short call after this Horizon issue fixed and SAHARA works only after I will work with Michal Jura. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs Date: Friday, February 3, 2017 at 10:28 AM To: "Ns, Rushi" Subject: Re: Accepted: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, I'm afraid because I'm used to finishing at dinner on Fridays and so it slipped my mind that we had a 6pm arranged. Sorry. I am available now to talk if you want, though I have spoken to Nick and he advised he has tested your Horizon setup and it works OK on his replica environment of what you have. With this situation we can only work with the premises that the Horizon issue is not a code problem but is local to your configuration. He did say he was going to try and help you today on this matter. Did this help? Kind regards Simon Briggs Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" Date: 02/02/2017 14:22 (GMT+00:00) To: Simon Briggs Subject: Accepted: Weekly review of SAP Big Data SOC 7 testing -- Les gens heureux ne sont pas press?s. -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Les gens heureux ne sont pas press?s. -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Mit freundlichen Gr??en / Best regards Carsten Duch Sales Engineer SUSE N?rdlicher Zubringer 9-11 40470 D?sseldorf (P)+49 173 5876 707 (H)+49 521 9497 6388 carsten.duch at suse.com -- SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded message was scrubbed... From: " Martin Weiss " Subject: [caasp-internal] Antw: Re: Can not download kubectl-config in CaaSP2 cluster due to wrong entry in /etc/hosts on the admin node Date: Tue, 21 Nov 2017 04:27:29 -0700 Size: 9371 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2961 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1204 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 794 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 768 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 760 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 948 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 806 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 2963 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1206 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 796 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 770 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 762 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 950 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 808 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 82373 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 47409 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5008 bytes Desc: not available URL: From rushi.ns at sap.com Tue Nov 21 08:29:36 2017 From: rushi.ns at sap.com (Ns, Rushi) Date: Tue, 21 Nov 2017 15:29:36 +0000 Subject: [caasp-beta] Antw: kubeconfig download error with DEX internal server error , Login error In-Reply-To: <5A1409E40200001C0030593B@prv-mh.provo.novell.com> References: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> <5A1409E40200001C0030593B@prv-mh.provo.novell.com> Message-ID: <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> Hi Martin Thank you. Yes I did , I used load balancer IP (lvsusekub8.pal.sap.corp) whichi s out of the cluster node address. The host I?ve specified is not velum ip, not master ip and not any of worker ip?s. Yes I do have the same entry as you said in my /etc/hosts file (FYI: lvsusekub2.pal.sap.corp is my API server FQDN) Here is my /etc/hosts file #-- start Salt-CaaSP managed hosts - DO NOT MODIFY -- ### service names ### 127.0.0.1 api api.infra.caasp.local lvsusekub2.pal.sap.corp Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Martin Weiss Date: Tuesday, November 21, 2017 at 3:18 AM To: "caasp-beta at lists.suse.com" , Rushi NS Subject: Antw: [caasp-beta] kubeconfig download error with DEX internal server error , Login error Hi Rushi, did you specify a specific external FQDN for the API? Could you check if you have a similar strange entry in the /etc/hosts file on the admin with 127.0.0.1 api ... ? --> this was blocking my velum to contact the API on a master and due to that I could not download the kube-config.. Martin Hello Team, I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: From rushi.ns at sap.com Tue Nov 21 08:59:42 2017 From: rushi.ns at sap.com (Ns, Rushi) Date: Tue, 21 Nov 2017 15:59:42 +0000 Subject: [caasp-beta] kubeconfig download error with DEX internal server error , Login error In-Reply-To: <90BF4755-A3AB-4F6D-9463-5AFEC7F5DF65@suse.com> References: <090536B7-4B1A-4425-89C0-B24F589F7606@sap.com> <90BF4755-A3AB-4F6D-9463-5AFEC7F5DF65@suse.com> Message-ID: <9C1A5FD7-45DA-4C2F-9F89-B91ACEF9436D@sap.com> Hi Rob, Thanks for detailed information as well filing the bug. First of all , my apologies reaching you guys directly, since the issue is rendering for many months (first we had CAASP-CLI issue with authorization and now kubeconfig download issue with DEX) , I thought I can get answers or solutions directly from SUSE guys rather than beta users which are outside SUSE. This is the main reason to contact you guys. With your information I will file bugs going forward and reach only the beta users. Thank you for your suggestion and help filing the bug behalf of me. I have my account with your bug system (https://bugzilla.suse.com/show_bug.cgi?id=1069175) and I have tried to search the bug you filed ang I get this error ?You are not authorized to access bug #1069175? , ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Tuesday, November 21, 2017 at 6:08 AM To: Rushi NS , Vincent Moutoussamy Cc: Simon Briggs , Vincent Untz , "caasp-beta at lists.suse.com" Subject: Re: kubeconfig download error with DEX internal server error , Login error ?Rushi, You emailed the list about this issue yes I can see that. However, investigating these things takes time. The engineering team need time to investigate it. Please be patient. Vincent is unable to help you with any technical issues he is our beta program manager for all SUSE Beta Programs and will just forward the email to the list again. I can see Martin emailed back this morning with some potential steps to follow that may help. I have attached them here for your convenience. Please attempt them and report back to the caasp-beta at lists.suse.com email He also logged bug ID 1069175 for you with this issue. I have asked you on numerous occasions to log a bug report before and this is now there. If you have not done already please create a Bugzilla account with your rushi.ns at sap.com email so I can add you as a CC to the bug (which will get you updates whenever anyone else adds comments to the bug). If you have already logged a bug and I cannot find it then great; please email caasp-beta at lists.suse.com with the Bugzilla ID number and someone will take a look for you. As I have suggested to you directly before, Martin is asking you to check the value entered into the External FQDN field in Velum is the correct one for your cluster. I asked you to do the same the next time you built a cluster but never heard back and I think you emailed someone else on the mailing list directly. We ask for the bug reports as they go straight to engineering. Emailing myself, Vincent or Simon about the issue without including caasp-beta at lists.suse.com will not make you any progress as we all end up getting different versions of the story without any diagnostic history. If the process is not followed correctly then we end up in the situation we are in now; where various people are getting the same emails from you without the information requested and no bug report logged. Now the bug has been logged it will be investigated but unless you create an account on the SUSE Bugzilla you will not be able to see it. Once you?ve created an account please let the caasp-beta at lists.suse.com list know and we can add you to the bug Martin logged on your behalf and you can continue diagnostics there. Please do not email myself, Simon or Vincent directly again about this issue or remove the caasp-beta at lists.suse.com list from the CC as this makes the email thread very hard to follow and will make the whole process take longer. Emailing random SUSE employees about an issue with no history of the issue or any diagnostic information requested is only going to slow things down in the long run and make it harder for our engineers to help you. Now we have a bug logged for you someone soon will email you and the caasp-beta at lists.suse.com with something to try or asking for some diagnostic info. Please do provide it and leave the caasp-beta at lists.suse.com email on CC as this gives your email the widest possible audience and best opportunity for someone to help. Thank you for your patience, Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- [cid:image001.png at 01D3629E.B6B16430] [cid:image002.png at 01D3629E.B6B16430] [cid:image003.png at 01D3629E.B6B16430] [cid:image004.png at 01D3629E.B6B16430] [cid:image005.png at 01D3629E.B6B16430] [cid:image006.png at 01D3629E.B6B16430] [cid:image007.png at 01D3629E.B6B16430] From: "Ns, Rushi" Date: Monday, 20 November 2017 at 23:20 To: Rob de Canha-Knight , Vincent Moutoussamy Cc: Simon Briggs , Vincent Untz Subject: kubeconfig download error with DEX internal server error , Login error Hi Rob, I did to reach betalist email but I wasn?t getting any response. Now I am stuck with this DEX error . Can someone from your team can help. We are getting lot of requests to build with SUSE CAASP as you guys already certified with SAP VORA and this becames a show stopper to me with this error. https://www.suse.com/communities/blog/sap-vora-2-0-released-suse-caasp-1-0/ @Vincent Moutoussamy: Can you help here is the problem ======================= I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Tuesday, November 14, 2017 at 2:56 PM To: Rushi NS Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Rushi, As advised in the previous mail I?m unable to provide any additional support to you on this matter and I have to direct you to obtain support through the usual channels for any additional queries. So please reach out to the caasp-beta mailing list or use Bugzilla to log a bug for investigation if you think the process is being followed correctly as we have not seen this issue internally during 2.0 testing in any of our environments or other beta user environments so we would appreciate the bug report so it can be investigated and fixed by our engineering team if it is indeed a problem with the product. Please note though that due to our HackWeek that we run at SUSE this week you may experience a slightly delayed response to both the caasp-beta mailing list as well as anything put through Bugzilla as in effect our product and engineering teams are off this week. Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- [cid:image008.png at 01D3629B.1C16B720] [cid:image009.png at 01D3629B.1C16B720] [cid:image010.png at 01D3629B.1C16B720] [cid:image011.png at 01D3629B.1C16B720] [cid:image012.png at 01D3629B.1C16B720] [cid:image013.png at 01D3629B.1C16B720] [cid:image014.png at 01D3629B.1C16B720] From: "Ns, Rushi" Date: Tuesday, 14 November 2017 at 18:05 To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Did you get a chance to check my mail and any solution to this problem . Do you think this is bug in the release. Like I said I have tried multi master as well single mater, both iterations the errors result is same Do think if any proxy issues. As you know the systems are behind proxy and I use proxy parameters during the setup. Here is my screenshot of proxy settings. Let me know if anyway to fix this. I can share my screen if you have few mins. this is really killing my team as I need to setup a SUSE based kubernetes which I was trying to do with KUBEADM but I am still hoping CAASP will overcome the issues with KUBEADM alterative but its not going as per my expectations [cid:image015.png at 01D3629B.1C16B720] Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS Date: Friday, November 10, 2017 at 3:09 PM To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, I have tried with using Dashboard host as admin node as you mentioned (velum host) , after doing everything I got the same error. I think this could be problem with multi master. I did another test with single master and it has the same error. Not sure likely where this error but I did everything correct based on your suggestion. [cid:image016.png at 01D3629B.1C16B720] Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS Date: Friday, November 10, 2017 at 11:59 AM To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Ok got it . Because of multi master I do require round robin which either the admin node or something with Laos balancer . Let me try this fix by rebuilt with multi master and if it fails then I will try with single master . Keep you posted. Have a nice weekend . Best Regards, Rushi. Success is not a matter of being the best & winning the race. Success is a matter of handling the worst & finishing the race Sent from my iPhone please excuse typos and brevity On Nov 10, 2017, at 11:30, Rob de Canha-Knight > wrote: In the k8s external fqdn that must be a load balancer set up externally from the cluster if doing multi-master. The external dashboard fqdn must be the value of the fqdn that velum is running on the admin node. If your admin node is lvsusekub1 then put that in there. Doing multi-master on bare metal requires a loadbalancer and it?s that loadbalancer address that goes in the top box. If you don?t have a loadbalancer then you can put in any of the master node fqdns and it will work. So put lvsusekub3 in the top box and lvsusekub1 in the bottom box and you can do round robin DNS on your dns server. It?s worth noting that once you enter those values they are fixed and to change them you have to rebuild the cluster from scratch. If this is a development environment I recommend using a single master node and putting that value in the top box and the admin node fqdn in the bottom box. Start simple and build up from there. I?m signing off now for the weekend. Have a good weekend. ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" > Date: Friday, 10 November 2017 at 19:24 To: Rob de Canha-Knight > Cc: Simon Briggs >, Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Ok , I agree some point (two boxes only ..i put two boxes with same hostname ?lvsusekub3? and lvsusekube3.pal.sap.corp). I setup with 3 masters as I mentioned before and this host LVSUSEKUB3 is the first master node hostname . I did make sure everything right except FQDN Question>: what is second box I should put hostname My admin node: lvsusekub1 Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Friday, November 10, 2017 at 11:19 AM To: Rushi NS > Cc: Simon Briggs >, Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 I've identified your problem. The first box is the k8s API endpoint. This field has to be set to the kubernetes master fqdn. I think you have it set to your admin node fqdn and that?s why things are not working. You?ll have to destroy your cluster and make sure that the top field in your screenshot has the fqdn of the k8s master node not the admin node (those two boxes must have different addresses in) ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" > Date: Friday, 10 November 2017 at 19:17 To: Rob de Canha-Knight > Cc: Simon Briggs >, Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Answer to your queries. You must make sure that you are accessing velum from the right FQDN ? the one you gave velum during the setup process when it asks for the internal and external dashboard FQDN. I set this during API FQDN I did make sure no plugin blocks (java sript) Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Friday, November 10, 2017 at 11:13 AM To: Rushi NS > Cc: Vincent Untz >, Simon Briggs > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 You must make sure that you are accessing velum from the right FQDN ? the one you gave velum during the setup process when it asks for the internal and external dashboard FQDN. Aside from that make sure you?ve not got any browser plugins that are blocking scripts or javascript from running. If you still cannot get it to work then you will have to wait for the 2.0 final release next week and try that. If you run into issues there I cannot help as it doesn?t fall into my role and you?ll have to use the official channels for support. ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" > Date: Friday, 10 November 2017 at 19:09 To: Rob de Canha-Knight > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Thanks. I did the setup with 3 master and 1 minions and its worked nicely but while downloading kubectl file the authentication I set during velum setup is not accepted and I get error downloading the kubectl file > Also I got the error you stated (not being able to talk to the velum API. When this happens please refresh your browser page and accept the new certificate.) I refresh but I didn?t get any where accept new certificate but all worked. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Friday, November 10, 2017 at 10:32 AM To: Rushi NS > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 It supports multi master and yes; your precious mail is correct. Sent from my iPhone - please excuse any shortness On 10 Nov 2017, at 18:29, Ns, Rushi > wrote: Hi Rob, Is this release supports multi master (controllers ? etcd) or single master. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS > Date: Friday, November 10, 2017 at 10:17 AM To: Rob de Canha-Knight > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Perfect and Thanks, I just downloaded and will start deploying and keep you posted. As I understand 2.0 is removed the caasp-cli authentication ? and everything should work as it was before with 1.0 using kubeconfig file downloaded from VELUM web. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Friday, November 10, 2017 at 10:01 AM To: Rushi NS > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 November 16th However; you can download our latest release candidate ISO from https://drive.google.com/file/d/1ZO0sduyV5GS3WThl0eLVjnMNHCaFIi5u/view?usp=sharing which doesn?t require you to use caasp-cli. One note; during the bootstrap process you will get an error at the top about not being able to talk to the velum API. When this happens please refresh your browser page and accept the new certificate. Once you have done this it will be able to talk to the API and you?re good to go. To obtain the kubeconfig file you click the button and this will redirect you to a new login page where you enter in your caas platform admin account credentials and it will offer your browser a download of the kubeconfig that has the correct client certificate in it. Many thanks, Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" > Date: Friday, 10 November 2017 at 17:58 To: Rob de Canha-Knight > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 HI Rob, What is the ETA for 2.0 release ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Tuesday, November 7, 2017 at 2:32 PM To: Rushi NS > Cc: Carsten Duch >, Johannes Grassler >, Michal Jura >, Nicolas Bock >, Simon Briggs >, Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Thanks Rushi - yes sticking with CaaSP will make your life much easier and enable you to get support as well once a suitable support contract/agreement is in place. When 2.0 is released we will have an updated user manual and deployment guide in the usual place (https://www.suse.com/documentation/suse-caasp/index.html) for you to consume so don?t worry you won?t get in any trouble :) Rob Sent from my iPhone - please excuse any shortness On 7 Nov 2017, at 23:27, Ns, Rushi > wrote: Hi Rob, Thank you. Yes, I am sticking to ?CAASP? only , since had issues with authorization I wanted to try out with kubeadm to setup a cluster for our DMZ internet facing for federation. KUBEADM is working but its pain as CAASP works nice with everything based on PXE which is what I would like to have in my future builds. If you say the 2.0 is coming out next, then I will wait . please provide the doucemntation how you consume 2.0 , so that I don?t get any trouble. Thank you so much for your quick reply. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 11/7/17, 2:22 PM, "Rob de Canha-Knight" > wrote: Hi Rushi. As mentioned on the thread I just sent you; the method Simon is referring to there is the manual upstream way to deploy Kubernetes. It is separate and very different from CaaSP and is completely unsupported in every way. As such; we cannot help you here with the kubeadm way in any way shape or form. Please stick with CaaSP for now if you can or want assistance from us. The version that doesn?t require you to use caasp-cli will be released by the end of next week (2.0 final) and you will be able to deploy that successfully and if you run into any issues we can help you. As a side note I kindly request that you use the CaaSP-beta mailing list for your queries as you did in the past or log a support ticket when you run into issues with the final release. You are likely to get a better response faster than emailing our product team directly plus the knowledge will be archived publicly for everyone else to benefit. Many thanks, Rob Sent from my iPhone - please excuse any shortness On 7 Nov 2017, at 23:13, Ns, Rushi > wrote: Hello Simon, How are you . Long time. I have some Question. Not sure if you can answer. As you know we are doing test of ?CAASP? from SUSE , however it is bit pain as CAASP-CLI authentication is boiling down the cluster without access. Rob is aware what I was talking. Since the CAASP is still issue with CAASP-CLI , I was thinking if SLES12 SP1 can work with KUBEADM method to install cluster. Did anyone tried from your side. I found this link but not sure https://forums.suse.com/archive/index.php/t-9637.html. Do you know who is ?simon (smflood)? is that you :( on the above link , he said he did install with KUBEADM using SLES 12 SP1 and SP2 where he has given images links to https://software.opensuse.org/download.html?project=Virtualization%3Acontainers&package=kubernetes can someone help me if KUBEADM method to insetall kubernetes cluster on SUSE 12 SP1/Sp2. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 3/14/17, 2:26 AM, "Simon Briggs" > wrote: Hi Rushi, I am part of the team delivering our Expert Day in Rome today so cannot make a call, but I want to make sure things are progressing for you. Please advise if Michal's advise worked or if you have new challenges we can help with. Thanks Simon Briggs On 10/03/17 09:10, Simon Briggs wrote: Hi Rushi, AJ has answered the CaaSP question. Bit I can help explain that SOC7 is now fully GA and can be downloaded freely from the https://www.suse.com/download-linux/ Cloud click through. Thanks Simon On 09/03/17 21:54, Ns, Rushi wrote: Hi Michaal, Any update on this. I am eagerly waiting for the change as I wil start the setup again when SOC7 GA comes out. @Vincent: Do you know when SOC7 GA comes out . Also CaaS Beta ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/23/17, 7:14 AM, "Ns, Rushi" > wrote: Hi Michal, Good to hear that it's doable, yes please test at your end and let me know. I will wait for your confirmation and procedure how to consume our designated SDN vlan. Best Regards, Rushi. Success is not a matter of being the best & winning the race. Success is a matter of handling the worst & finishing the race Sent from my iPhone please excuse typos and brevity On Feb 23, 2017, at 03:04, Michal Jura > wrote: Hi Rushi, It should be possible to use VLAN ID 852 for Magnum private network. You should configure network with name private in advance with vlan ID 852, but I have to test it first. Changing subnet to 192.168.x.x should be durable too, but I have to check it. Please give me some time and I will come back to you. Best regards, Michal On 02/22/2017 11:01 PM, Ns, Rushi wrote: Hi Carsten,. Thank you. As you know we have VLAN ID *852* as SDN in network.json which is already in our switch level. Here I have question or suggestion. Can I use this VLAN 852 for Magnum side as L2 traffic ? we do not want to use 10.x.x.x IP space, so we use non-routable 192.168.x.x kind of IP space which will route through our 852 VLAN . Is it possible to define this in Heat Template, so that cluster deployment will generate 192.168.x.x subnet instead of 10.x.x.x subnet when a kubernetes cluster created? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE *From: *Carsten Duch > *Date: *Wednesday, February 22, 2017 at 10:21 AM *To: *"Ns, Rushi" >, Johannes Grassler >, Michal Jura >, Vincent Untz > *Cc: *Nicolas Bock >, Simon Briggs > *Subject: *AW: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, Theater Problem is that you have configured it to use the vlans from 222 to 2222. You have to choose a range which is allowed on the Trunk port and not already in use. If you want to change the starting Point you have to redeploy the whole cloud and provide the correct vlan id when editing the network.json. So without that, you are only able to change the max number up to a value you are able to use. Maybe 50 for 222 to 272. Or try vxlan instead of vlan again. But I think that the overall problem is a misconfigured switch. Make sure that all vlan ids are allowed for the Trunk and you will have a good chance that it works. Von meinem Samsung Galaxy Smartphone gesendet. -------- Urspr?ngliche Nachricht -------- Von: "Ns, Rushi" > Datum: 22.02.17 19:04 (GMT+01:00) An: Carsten Duch >, Johannes Grassler >, Michal Jura >, Vincent Untz > Cc: Nicolas Bock >, Simon Briggs > Betreff: Re: Weekly review of SAP Big Data SOC 7 testing HI Carsten Yes I am aware as we discussed this during our call and after reading your response, however the vlan 222-322 is already used in our production particularly 271 is our Laptop VLAN (All employees get IP address of the Laptops ) which we cannot use it for this. I am looking for alternatives. Let me know if you have any idea other than this 222-322 allow ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/21/17, 10:38 PM, "Carsten Duch" > wrote: Hi Rushi, have you tried to configure your switch according to my email from 14th? Maybe you didn't got the mail? I suggested the following configuration on the switch: Your are using linuxbridge with vlan. Make sure to allow tagging of VLANs on the switch and add the range to the allowed VLANs for the TRUNK. The range is defined by your fixed vlan and the maximum number of VLANs. starting point: fixed VLAN id = 222 + Maximum Number of VLANs configured in the Neutron Barclamp= 2000 That means that you have to allow a range from 222 to 2222 on your switch side. But I would recommend to reduce the Maximum so that it will not overlap with other existing VLANs. You can reduce it to 100 or something lower and then allow a range from 222 to 322 for the TRUNK Port. You don't need to create all the VLANs manually but you need to allow VLAN tagging for the Port and allow a range. Depending on your switch,the configuration should look something like: switchport trunk allow vlan 222-322 http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus5000/sw/configuration/guide/cli/CLIConfigurationGuide/AccessTrunk.html Make sure to allow all the VLANs from your network.json for the TRUNK Port. On 21.02.2017 23:40, Ns, Rushi wrote: Hi Michal, Yes, that?s obviously the root cause I found before your email but it is cumbersome to understand the flow of the segmentation ID which I need to discuss how we can overcome. What I observe is, every time I create new cluster the private network generates a new segment ID:: 271, 272, 273 like that?(this is like VLAN) which our floating VLAN should be able to reach only when we add this segment ID (dummy ID 231,232 or whatever generates) to our swith level as real VLAN otherwise the private network subnet cannot reach to floating IP . check attached picture contains the information of segmentation ID: I remember I had one session with one of your SuSE person (carsten.duch at suse.com) recently I shared my system screen and we discussed this network segment issue (Software Defined Networ) and he answered some of that , however it appeared its beyond is knowledge. I have CC?d Carsten here., so you can talk to him. Do you have any idea what needs to be done on the physical network swtich level where the VLANs already connected but not this VLAN (271, 272,whatever) because this is really not easy to allow in real network switch configuration of the VLAN to allow this trunked port which doesn?t exist at all. We had the same issue before in deploying cloud foundry on top of openstack and we fool the switch with the private segment ID created and at the end we found this is a bug in openstack SDN side. Let me know what needs to be done and I can do that. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/21/17, 4:27 AM, "Michal Jura" > wrote: Hi, This problem looks like there is no connection from private network where kube-master and kube-mionions are launched to Heat PublicURL endpoint. Please fix network configuration. On 02/20/2017 08:10 PM, Johannes Grassler wrote: Hello Rushi, alright, so we are creating a cluster now but the Kubernetes master fails to signal success to the Heat API (that's what WaitConditionTimeout means). Unfortunately this is where debugging becomes fairly hard...can you ssh to the cluster's Kubernetes master and get me /var/log/cloud-init.log and /var/log/cloud-init-output.log please? Maybe we are lucky and find the cause of the problem in these logs. If there's nothing useful in there I'll probably have to come up with some debugging instrumentation next... Cheers, Johannes On 02/20/2017 07:53 PM, Ns, Rushi wrote: Hi Johannes, Thanks, I just tried with the changes you mentioned and I see that it made some progress this time (creating private network subnet, heat stack and instance as well cluster ) , however after some time it failed with ? CREATE_FAILED? status. Here is the log incase if you want to dig in more. ================ 2017-02-20 09:01:18.148 92552 INFO oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] Reconnected to AMQP server on 10.48.220.40:5672 via [amqp] clientwith port 36265. 2017-02-20 10:36:25.914 92552 INFO magnum.conductor.handlers.cluster_conductor [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] The stack None was not found during cluster deletion. 2017-02-20 10:36:26.515 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] Deleting certificate e426103d-0ecf-4044-9383-63305c667a c2 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 10:36:26.517 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] Deleting certificate a9a20d33-7b54-4393-8385-85c4900a0f 79 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 10:37:39.905 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-d50c84af-7eca-4f76-8e2b-dc49933d0376 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 10:37:40.049 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-d50c84af-7eca-4f76-8e2b-dc49933d0376 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 10:48:48.172 92552 ERROR magnum.conductor.handlers.cluster_conductor [req-ac20eb45-8ba9-4b73-a771-326122e94ad7 522958fb-fd7c-4c33-84d2-1ae9e60c1574 - - - -] Cluster error, stack status: CREATE_ FAILED, stack_id: e47d528d-f0e7-4a40-a0d3-12501cf5a984, reason: Resource CREATE failed: WaitConditionTimeout: resources.kube_masters.resources[0].resources.master_wait_condition: 0 of 1 received 2017-02-20 10:48:48.510 92552 INFO magnum.service.periodic [req-ac20eb45-8ba9-4b73-a771-326122e94ad7 522958fb-fd7c-4c33-84d2-1ae9e60c1574 - - - -] Sync up cluster with id 15 from CREATE_IN_PROGRESS to CRE ATE_FAILED. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 9:51 AM, "Johannes Grassler" > wrote: Hello Rushi, I took a closer look at the SUSE driver and `--discovery-url none` will definitely take care of any etcd problems. The thing I'm not quite so sure about is the registry bit. Can you please try the following... magnum cluster-template-create --name k8s_template\ --image-id sles-openstack-magnum-kubernetes \ --keypair-id default \ --external-network-id floating \ --dns-nameserver 8.8.8.8 \ --flavor-id m1.magnum \ --master-flavor-id m1.magnum \ --docker-volume-size 5 \ --network-driver flannel \ --coe kubernetes \ --floating-ip-enabled \ --tls-disabled \ --http-proxy http://proxy.pal.sap.corp:8080 magnum cluster-create --name k8s_cluster \ --cluster-template k8s_template \ --master-count 1 \ --node-count 2 \ --discovery-url none ...and see if that yields a working cluster for you? It still won't work in a completely disconnected environment, but with the proxy you have in place it should work. Some explanation: the --discovery-url none will disable the validation check that causes the GetDiscoveryUrlFailed error, allowing Magnum to instantiate the Heat template making up the cluster. The --http-proxy http://proxy.pal.sap.corp:8080 will then cause the cluster to try and access the Docker registry through the proxy. As far as I understand our driver, the --registry-enabled --labels registry_url=URL will require you to set up a local docker registry in a network reachable from the Magnum bay's instances and specify a URL pointing to that docker registry. I'd rather not ask you to do that if access through the proxy turns out to work. Cheers, Johannes On 02/20/2017 04:23 PM, Ns, Rushi wrote: Hi Johannes, I have also added https_proxy parameter thought it might need both (http and https) but even that failed too. I see the log expected to have discovery etcd. magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --http-proxy http://proxy.pal.sap.corp:8080 --https-proxy http://proxy.pal.sap.corp:8080 magnum-conductor.log ===================== 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 7:16 AM, "Ns, Rushi" > wrote: Hello Johannes, No luck even after adding the internet proxy at the time of cluster template creation and without specify anything at the cluster-create . The cluster create failed and this time I don?t see anything like , no heat stack created, no private kubernetes network subnet created and many. Here are the commands I tried. Let me know if this is how supposed to be used or am I doing something wrong. magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --http-proxy http://proxy.pal.sap.corp:8080 magnum cluster-create --name k8s_cluster --cluster-template k8s_template --master-count 1 --node-count 2 this is the magnum-conductor.log I see something more needed . 2017-02-20 06:55:27.245 92552 ERROR magnum.drivers.common.template_def [-] HTTPSConnectionPool(host='discovery.etcd.io', port=443): Max retries exceeded with url: /new?size=1 (Caused by NewConnectionError (': Failed to establish a new connection: [Errno 113] EHOSTUNREACH',)) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server [-] Exception during message handling 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 165, in cluster_create 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server create_timeout) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 97, in _create_stack 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server _extract_template_definition(context, cluster)) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 82, in _extract_template_definition 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server scale_manager=scale_manager) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 337, in extract_definition 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server self.get_params(context, cluster_template, cluster, **kwargs), 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/template_def.py", line 50, in get_params 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server extra_params['discovery_url'] = self.get_discovery_url(cluster) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server [-] Can not acknowledge message. Skip processing 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 126, in _process_incoming 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server message.acknowledge() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 119, in acknowledge 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.message.acknowledge() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 251, in acknowledge 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self._raw_message.ack() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.channel.basic_ack(self.delivery_tag) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self._send_method((60, 80), args) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.channel_id, method_sig, args, content, 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server write_frame(1, channel, payload) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 188, in write_frame 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server frame_type, channel, size, payload, 0xce, 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 385, in sendall 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server tail = self.send(data, flags) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 379, in send 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server return self._send_loop(self.fd.send, data, flags) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 366, in _send_loop 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server return send_method(data, *args) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server error: [Errno 104] Connection reset by peer 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server 2017-02-20 06:55:27.310 92552 ERROR oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] AMQP server on 10.48.220.40:5672 is unreachable: . Trying again in 1 seconds. Client port: 50462 2017-02-20 06:55:28.347 92552 INFO oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] Reconnected to AMQP server on 10.48.220.40:5672 via [amqp] clientwith port 58264. 2017-02-20 06:59:09.827 92552 INFO magnum.conductor.handlers.cluster_conductor [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] The stack None was not found during cluster deletion. 2017-02-20 06:59:10.400 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] Deleting certificate 105d39e9-ca2a-497c-b951-df87df2a02 24 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 06:59:10.402 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] Deleting certificate f0004b69-3634-4af9-9fec-d3fdba074f 4c from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 07:02:37.658 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 07:02:37.819 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 07:02:40.026 92552 ERROR magnum.drivers.common.template_def [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] HTTPSConnectionPool(host='discovery.etcd.io', port=443): Max retries exceeded with url: /new?size=1 (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 113] EH OSTUNREACH',)) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Exception during message handling 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 165, in cluster_create 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server create_timeout) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 97, in _create_stack 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server _extract_template_definition(context, cluster)) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 82, in _extract_template_definition 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server scale_manager=scale_manager) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 337, in extract_definition 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server self.get_params(context, cluster_template, cluster, **kwargs), 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/template_def.py", line 50, in get_params 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server extra_params['discovery_url'] = self.get_discovery_url(cluster) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 12:41 AM, "Johannes Grassler" > wrote: Hello Rushi, On 02/20/2017 12:26 AM, Ns, Rushi wrote: Hi Johannes/Vincent Thank you to both for the detailed. I did those steps as per the link https://www.suse.com/documentation/suse-openstack-cloud-7/book_cloud_suppl/data/sec_deploy_kubernetes_without.html you provided before executing the cluster as I learned this in the document , however I am sure I did something wrong as ii don?t know what public etcd discovery url since I don?t have anything setup on my end. Here are the command I used and if you see I specified that parameter as you suggested but only as ?URL? without knowing the real value of ?URL? (--labels registry_url=URL) , so this is my mistake or how it should be used ? I am not sure, but I followed your document ? ---------------------------------- 1) magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --registry-enabled --labels insecure_registry_url=URL 2) magnum cluster-create --name k8s_cluster --cluster-template k8s_template --master-count 1 --node-count 2 --discovery-url none ----------------------------------- Now I would like to understand where and how I can setup my own local etcd discovery service ? is it required. As far as I know etcd it is. I may be wrong though. Luckily there is another solution: Also our internet access is through proxy port (http://proxy.pal.sap.corp:8080) so if you can guide how to do that setup, I can do or tell me the URL value to specified and I can try. Just add an `--http-proxy http://proxy.pal.sap.corp:8080` <%20%20> when creating the cluster template and do NOT provide any discovery URL options for either the cluster template or the cluster itself. Provided the proxy doesn't require authentication this should do the trick... Cheers, Johannes Also I wanted to inform that, we had issue Horizon (public and admin page IP is not hand shake) with BETA 8 Neutron going with VLAN open switch, Nicolas and I had some sessions towards and Nicolas suggested to use ?LinuxBridge instead openvswith? since the patch he has may not be in the BETA8 that I download. . you can check with Nicolas on this as our current BEtA8 seems not good with VLAN/openvswitch. At any cost, I will remove this cluster and rebuild it soon but I wail wait until the full GA build comes out instead of BETA 8 or I can try if you can think the latest BETA 8 will not have issues overall. Please suggest and provide me the help for above value ?labels insecure_registry_url=URL? or how to setup local etc discovery service ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/17/17, 1:14 AM, "Vincent Untz" > wrote: Hi, Le vendredi 17 f?vrier 2017, ? 10:02 +0100, Johannes Grassler a ?crit : Hello Rushi, sorry, this took me a while to figure out. This is not the issue I initially thought it was. Rather it appears to be related to your local networking setup and/or the cluster template you used. This is the crucial log excerpt: | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. Magnum uses etcd to orchestrate its clusters' instances. To that end it requires a discovery URL where cluster members announce their presence. By default Magnum uses the public etcd discovery URL https://discovery.etcd.io/new?size=%(size)d This will not work in an environment without Internet access which I presume yours is. The solution to this problem is to set up a local etcd discovery service and configure its URL template in magnum.conf: [cluster] etcd_discovery_service_endpoint_format = https://my.discovery.service.local/new?size=%(size)d Ah, this use case is in our doc. Rushi, can you follow what's documented at: https://www.suse.com/documentation/suse-openstack-cloud-7/book_cloud_suppl/data/sec_deploy_kubernetes_without.html Vincent Cheers, Johannes On 02/16/2017 05:03 AM, Ns, Rushi wrote: Hi Simon. Some reason the mail I sent this morning didn?t go, also did?t bounced back but I found it was stuck in my drafts. Anyways, sorry about the delay . Here you go again. Please find attached files of magnum as requested. Please find below output of other commands result. ------ root at d38-ea-a7-93-e6-64:/var/log # openstack user list +----------------------------------+---------------------+ | ID | Name | +----------------------------------+---------------------+ | d6a6e5c279734387ae2458ee361122eb | admin | | 7cd6e90b024e4775a772449f3aa135d9 | crowbar | | ea68b8bd8e0e4ac3a5f89a4e464b6054 | glance | | c051a197ba644a25b85e9f41064941f6 | cinder | | 374f9b824b9d43d5a7d2cf37505048f0 | neutron | | 062175d609ec428e876ee8f6e0f39ad3 | nova | | f6700a7f9d794819ab8fa9a07997c945 | heat | | dd22c62394754d95a8feccd44c1e2857 | heat_domain_admin | | 9822f3570b004cdca8b360c2f6d4e07b | aodh | | ac06fd30044e427793f7001c72f92096 | ceilometer | | d694b84921b04f168445ee8fcb9432b7 | magnum_domain_admin | | bf8783f04b7a49e2adee33f792ae1cfb | magnum | | 2289a8f179f546239fe337b5d5df48c9 | sahara | | 369724973150486ba1d7da619da2d879 | barbican | | 71dcd06b2e464491ad1cfb3f249a2625 | manila | | e33a098e55c941e7a568305458e2f8fa | trove | +----------------------------------+---------------------+ root at d38-ea-a7-93-e6-64:/var/log # openstack domain list +----------------------------------+---------+---------+-------------------------------------------+ | ID | Name | Enabled | Description | +----------------------------------+---------+---------+-------------------------------------------+ | default | Default | True | The default domain | | f916a54a4c0b4a96954bad9f9b797cf3 | heat | True | Owns users and projects created by heat | | 51557fee0408442f8aacc86e9f8140c6 | magnum | True | Owns users and projects created by magnum | +----------------------------------+---------+---------+-------------------------------------------+ root at d38-ea-a7-93-e6-64:/var/log # openstack role assignment list +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ | Role | User | Group | Project | Domain | Inherited | +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ | 6c56316ecd36417184629f78fde5694c | d6a6e5c279734387ae2458ee361122eb | | 6d704aa281874622b02a4e24954ede18 | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 7cd6e90b024e4775a772449f3aa135d9 | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 6c56316ecd36417184629f78fde5694c | d6a6e5c279734387ae2458ee361122eb | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 932db80652074571ba1b98738c5af598 | 7cd6e90b024e4775a772449f3aa135d9 | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | ea68b8bd8e0e4ac3a5f89a4e464b6054 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | ea68b8bd8e0e4ac3a5f89a4e464b6054 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | c051a197ba644a25b85e9f41064941f6 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | c051a197ba644a25b85e9f41064941f6 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 374f9b824b9d43d5a7d2cf37505048f0 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 374f9b824b9d43d5a7d2cf37505048f0 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 062175d609ec428e876ee8f6e0f39ad3 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 062175d609ec428e876ee8f6e0f39ad3 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | f6700a7f9d794819ab8fa9a07997c945 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | f6700a7f9d794819ab8fa9a07997c945 | | 19c2c03e858b47da83eda020aa83639e | | False | | 932db80652074571ba1b98738c5af598 | d6a6e5c279734387ae2458ee361122eb | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 6c56316ecd36417184629f78fde5694c | dd22c62394754d95a8feccd44c1e2857 | | | f916a54a4c0b4a96954bad9f9b797cf3 | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 9822f3570b004cdca8b360c2f6d4e07b | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 9822f3570b004cdca8b360c2f6d4e07b | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | ac06fd30044e427793f7001c72f92096 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | ac06fd30044e427793f7001c72f92096 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | d694b84921b04f168445ee8fcb9432b7 | | | 51557fee0408442f8aacc86e9f8140c6 | False | | 9fe2ff9ee4384b1894a90878d3e92bab | bf8783f04b7a49e2adee33f792ae1cfb | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | bf8783f04b7a49e2adee33f792ae1cfb | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 2289a8f179f546239fe337b5d5df48c9 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 2289a8f179f546239fe337b5d5df48c9 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 369724973150486ba1d7da619da2d879 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 369724973150486ba1d7da619da2d879 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 71dcd06b2e464491ad1cfb3f249a2625 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 71dcd06b2e464491ad1cfb3f249a2625 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | e33a098e55c941e7a568305458e2f8fa | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | e33a098e55c941e7a568305458e2f8fa | | 19c2c03e858b47da83eda020aa83639e | | False | +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/15/17, 11:01 AM, "Ns, Rushi" > wrote: Hi Simon, I am sorry, I got stuck. Sure I will send the logs now . Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/15/17, 10:26 AM, "Simon Briggs" > wrote: Hi Rushi, I assume you where unable to join our call. Would it be possible to collect the logs that we request, as this is the only way my teams can help you remotely. Regards Simon Briggs On 15/02/17 08:58, Johannes Grassler wrote: Hello Rushi, ok. Can you please supply 1) A supportconfig tarball: this will have the contents of both /etc/magnum/magnum.conf.d/ and magnum-conductor.log which should allow me to figure out what is wrong. 2) The output of `openstack user list`, `openstack domain list`, `openstack role assignment list` (all run as the admin user). With that information I should be able to figure out whether your problem is the one I mentioned earlier. Cheers, Johannes On 02/14/2017 04:42 PM, Ns, Rushi wrote: Hello Johannes, Thank you for the information. FYI, my setup is not on HA . Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/14/17, 12:43 AM, "Johannes Grassler" > wrote: Hello Rushi, if the problem is the | Creating cluster failed for the following reason(s): Failed to create trust Error ID: c7a27e1f-6a6a-452e-8d29-a38dbaa3fd78, Failed to create trust Error ID: a9f328cc-05e8-4c87-9876-7db5365812f2 error you mentioned below, the problem is likely to be with Magnum rather than with Heat. Magnum creates a Keystone trust for each cluster that the cluster's VMs use to talk to the Magnum API among others. We had a spell of trouble[0] with that recently and you may be running into the same problem, especially if you are running an HA setup. Are you? If so, check if all files in /etc/magnum/magnum.conf.d/ match across all controller nodes. If there are differences, especially in the [trust] section you are probably affected by the same issue we ran into recently. Cheers, Johannes [0] https://github.com/crowbar/crowbar-openstack/pull/843 On 02/14/2017 09:01 AM, Simon Briggs wrote: Hi Rushi, You advise that you still have an issue. Would this still be the same as the one that Vincent helped with below? I have added Johannes to those CC'd as he is skilled in debugging that type of error. Thanks Simon Sent from my Samsung device -------- Original message -------- From: Vincent Untz > Date: 06/02/2017 12:39 (GMT+02:00) To: Rushi Ns > Cc: Michal Jura >, Nicolas Bock >, Simon Briggs > Subject: Re: Weekly review of SAP Big Data SOC 7 testing Rushi, About the "Failed to create trust": can you check the heat logs? My guess is that the error comes from there and more context about what's happening around that error would probably be useful. Thanks, Vincent Le lundi 06 f?vrier 2017, ? 04:01 +0000, Ns, Rushi a ?crit : Hi Simon Thank you. Please try if Michal can give some information about the image of kubernetes and how to consume. To me I have full knowledge of kubernetes since from long time also we are in production kubernetes in Germany for many projects which I did. Anyways, please try to get Michal for 1 or 2 hours discussion so that I get idea also please help to find the image from the link provided is not available at this time. http://download.suse.de/ibs/Devel:/Docker:/Images:/SLE12SP2-JeOS-k8s-magnum/images/sles-openstack-magnum-kubernetes.x86_64.qcow2 @Michal: Would you be kind to help me to get the Kuberentes image as bove link is not working Regards to SAHARA, I made progress of upload image (mirantis prepared images of SAHARA Hadoop) and created the necessary configuration (cluster templates, node templates and everything) and at the final creating a cluster from template erord with the following. , so I really need someone from your team having SAHARA knowledge would help to get the issue fixed. here is the error while creating cluster. Creating cluster failed for the following reason(s): Failed to create trust Error ID: c7a27e1f-6a6a-452e-8d29-a38dbaa3fd78, Failed to create trust Error ID: a9f328cc-05e8-4c87-9876-7db5365812f2 [cid:image001.png at 01D27FEA.9C6BC8F0] Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs > Date: Saturday, February 4, 2017 at 1:57 AM To: "Ns, Rushi" > Cc: Michal Jura >, Nicolas Bock >, Vincent Untz > Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, Thanks for the update and I'm glad we are moving forward. We'll done everyone. Michal is indeed an expert around these services, though I am aware he is presently on a sprint team mid cycle so he may find it difficult to do his required workload and deal with external work as well. So please be patient if it takes a small amount of time for him to respond Thanks Simon Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" > Date: 04/02/2017 02:19 (GMT+00:00) To: Simon Briggs > Cc: Michal Jura >, Nicolas Bock > Subject: Re: Weekly review of SAP Big Data SOC 7 testing HI Simon, Just to give you update. The Horizon issue was resolved changing the Nuetron from OPENVSWITCH to LinuxBridge as mentioned by Nick. Now I need to move forward for SAHARA which I can try, but if I run into issues, I might need some expertise who will be having SAHARA knowledge from your team. Regards to other request Magnum (kubernetes) I would like to discuss with Michal Jura (mjura at suse.com), I have Cc?d here as I was going through his github document https://github.com/mjura/kubernetes-demo but wasn?t able to find the image as he specified Link to the image http://download.suse.de/ibs/Devel:/Docker:/Images:/SLE12SP2-JeOS-k8s-magnum/images/sles-openstack-magnum-kubernetes.x86_64.qcow2 Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs > Date: Friday, February 3, 2017 at 10:38 AM To: "Ns, Rushi" > Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi, Sorry about delaying you. I will coordinate with Nick to get the best resource for you. Thanks Simon Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" > Date: 03/02/2017 18:33 (GMT+00:00) To: Simon Briggs > Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi Simon, Thank you, I waited on the call, however the toll free number is not US number which call never went through(the Toll free seems UK ), but I stayed on GOtoMEETiNG for 15 mins and disconnected. Sure, I will sync up with Nick and yes you are right it seems not aa code issue, however we are not sure which I will check with Nick in about 1 hour . Keep you posted. Also I need help on Magnum (kubernetes side as well) I see a person Michal Jura (mjura at suse.com) I spoke with Nick to bring Michal on another call to start the Magnum stuff. Can you try to arrange Michal to be with me next week for a short call after this Horizon issue fixed and SAHARA works only after I will work with Michal Jura. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs > Date: Friday, February 3, 2017 at 10:28 AM To: "Ns, Rushi" > Subject: Re: Accepted: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, I'm afraid because I'm used to finishing at dinner on Fridays and so it slipped my mind that we had a 6pm arranged. Sorry. I am available now to talk if you want, though I have spoken to Nick and he advised he has tested your Horizon setup and it works OK on his replica environment of what you have. With this situation we can only work with the premises that the Horizon issue is not a code problem but is local to your configuration. He did say he was going to try and help you today on this matter. Did this help? Kind regards Simon Briggs Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" > Date: 02/02/2017 14:22 (GMT+00:00) To: Simon Briggs > Subject: Accepted: Weekly review of SAP Big Data SOC 7 testing -- Les gens heureux ne sont pas press?s. -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Les gens heureux ne sont pas press?s. -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Mit freundlichen Gr??en / Best regards Carsten Duch Sales Engineer SUSE N?rdlicher Zubringer 9-11 40470 D?sseldorf (P)+49 173 5876 707 (H)+49 521 9497 6388 carsten.duch at suse.com -- SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2963 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1199 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 796 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 770 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 762 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 950 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 808 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 2965 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1193 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 798 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 772 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 764 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 952 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 810 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 82375 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 47411 bytes Desc: image016.png URL: From vneuhauss at suse.com Tue Nov 21 09:02:27 2017 From: vneuhauss at suse.com (vinicius) Date: Tue, 21 Nov 2017 14:02:27 -0200 Subject: [caasp-beta] CaaS 2 root password. Message-ID: <8fc5fe4b-6160-d12e-5353-84145dd5cb04@suse.com> Hello All. I Have two questions about CaaS: 1) How can I change the root password ? 2) Ex. I have 2 CaaS structures individually? with the same logical configuration and physical configuration like 3 masters and 3 workers for infra A and the same configuration for infra B , If I want remove one worker node from A and re-insert this node in infra B structure without reinstall the node is possible? From rushi.ns at sap.com Tue Nov 21 09:05:28 2017 From: rushi.ns at sap.com (Ns, Rushi) Date: Tue, 21 Nov 2017 16:05:28 +0000 Subject: [caasp-beta] kubeconfig download error with DEX internal server error , Login error In-Reply-To: <9C1A5FD7-45DA-4C2F-9F89-B91ACEF9436D@sap.com> References: <090536B7-4B1A-4425-89C0-B24F589F7606@sap.com> <90BF4755-A3AB-4F6D-9463-5AFEC7F5DF65@suse.com> <9C1A5FD7-45DA-4C2F-9F89-B91ACEF9436D@sap.com> Message-ID: <22489E4B-B6E2-4FE5-A2FB-6D16C17E733B@sap.com> Hi Rob, I have filed the bug since I wasn?t able to access the bug you filed. Bug 1069251 has been added to the database Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS Date: Tuesday, November 21, 2017 at 7:59 AM To: Rob de Canha-Knight , Vincent Moutoussamy Cc: Simon Briggs , Vincent Untz , "caasp-beta at lists.suse.com" Subject: Re: kubeconfig download error with DEX internal server error , Login error Hi Rob, Thanks for detailed information as well filing the bug. First of all , my apologies reaching you guys directly, since the issue is rendering for many months (first we had CAASP-CLI issue with authorization and now kubeconfig download issue with DEX) , I thought I can get answers or solutions directly from SUSE guys rather than beta users which are outside SUSE. This is the main reason to contact you guys. With your information I will file bugs going forward and reach only the beta users. Thank you for your suggestion and help filing the bug behalf of me. I have my account with your bug system (https://bugzilla.suse.com/show_bug.cgi?id=1069175) and I have tried to search the bug you filed ang I get this error ?You are not authorized to access bug #1069175? , ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Tuesday, November 21, 2017 at 6:08 AM To: Rushi NS , Vincent Moutoussamy Cc: Simon Briggs , Vincent Untz , "caasp-beta at lists.suse.com" Subject: Re: kubeconfig download error with DEX internal server error , Login error Rushi, You emailed the list about this issue yes I can see that. However, investigating these things takes time. The engineering team need time to investigate it. Please be patient. Vincent is unable to help you with any technical issues he is our beta program manager for all SUSE Beta Programs and will just forward the email to the list again. I can see Martin emailed back this morning with some potential steps to follow that may help. I have attached them here for your convenience. Please attempt them and report back to the caasp-beta at lists.suse.com email He also logged bug ID 1069175 for you with this issue. I have asked you on numerous occasions to log a bug report before and this is now there. If you have not done already please create a Bugzilla account with your rushi.ns at sap.com email so I can add you as a CC to the bug (which will get you updates whenever anyone else adds comments to the bug). If you have already logged a bug and I cannot find it then great; please email caasp-beta at lists.suse.com with the Bugzilla ID number and someone will take a look for you. As I have suggested to you directly before, Martin is asking you to check the value entered into the External FQDN field in Velum is the correct one for your cluster. I asked you to do the same the next time you built a cluster but never heard back and I think you emailed someone else on the mailing list directly. We ask for the bug reports as they go straight to engineering. Emailing myself, Vincent or Simon about the issue without including caasp-beta at lists.suse.com will not make you any progress as we all end up getting different versions of the story without any diagnostic history. If the process is not followed correctly then we end up in the situation we are in now; where various people are getting the same emails from you without the information requested and no bug report logged. Now the bug has been logged it will be investigated but unless you create an account on the SUSE Bugzilla you will not be able to see it. Once you?ve created an account please let the caasp-beta at lists.suse.com list know and we can add you to the bug Martin logged on your behalf and you can continue diagnostics there. Please do not email myself, Simon or Vincent directly again about this issue or remove the caasp-beta at lists.suse.com list from the CC as this makes the email thread very hard to follow and will make the whole process take longer. Emailing random SUSE employees about an issue with no history of the issue or any diagnostic information requested is only going to slow things down in the long run and make it harder for our engineers to help you. Now we have a bug logged for you someone soon will email you and the caasp-beta at lists.suse.com with something to try or asking for some diagnostic info. Please do provide it and leave the caasp-beta at lists.suse.com email on CC as this gives your email the widest possible audience and best opportunity for someone to help. Thank you for your patience, Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- [cid:image001.png at 01D3629F.85DA3A70] [cid:image002.png at 01D3629F.85DA3A70] [cid:image003.png at 01D3629F.85DA3A70] [cid:image004.png at 01D3629F.85DA3A70] [cid:image005.png at 01D3629F.85DA3A70] [cid:image006.png at 01D3629F.85DA3A70] [cid:image007.png at 01D3629F.85DA3A70] From: "Ns, Rushi" Date: Monday, 20 November 2017 at 23:20 To: Rob de Canha-Knight , Vincent Moutoussamy Cc: Simon Briggs , Vincent Untz Subject: kubeconfig download error with DEX internal server error , Login error Hi Rob, I did to reach betalist email but I wasn?t getting any response. Now I am stuck with this DEX error . Can someone from your team can help. We are getting lot of requests to build with SUSE CAASP as you guys already certified with SAP VORA and this becames a show stopper to me with this error. https://www.suse.com/communities/blog/sap-vora-2-0-released-suse-caasp-1-0/ @Vincent Moutoussamy: Can you help here is the problem ======================= I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight Date: Tuesday, November 14, 2017 at 2:56 PM To: Rushi NS Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Rushi, As advised in the previous mail I?m unable to provide any additional support to you on this matter and I have to direct you to obtain support through the usual channels for any additional queries. So please reach out to the caasp-beta mailing list or use Bugzilla to log a bug for investigation if you think the process is being followed correctly as we have not seen this issue internally during 2.0 testing in any of our environments or other beta user environments so we would appreciate the bug report so it can be investigated and fixed by our engineering team if it is indeed a problem with the product. Please note though that due to our HackWeek that we run at SUSE this week you may experience a slightly delayed response to both the caasp-beta mailing list as well as anything put through Bugzilla as in effect our product and engineering teams are off this week. Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- [id:image008.png at 01D3629B.1C16B720] [id:image009.png at 01D3629B.1C16B720] [id:image010.png at 01D3629B.1C16B720] [id:image011.png at 01D3629B.1C16B720] [id:image012.png at 01D3629B.1C16B720] [id:image013.png at 01D3629B.1C16B720] [id:image014.png at 01D3629B.1C16B720] From: "Ns, Rushi" Date: Tuesday, 14 November 2017 at 18:05 To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Did you get a chance to check my mail and any solution to this problem . Do you think this is bug in the release. Like I said I have tried multi master as well single mater, both iterations the errors result is same Do think if any proxy issues. As you know the systems are behind proxy and I use proxy parameters during the setup. Here is my screenshot of proxy settings. Let me know if anyway to fix this. I can share my screen if you have few mins. this is really killing my team as I need to setup a SUSE based kubernetes which I was trying to do with KUBEADM but I am still hoping CAASP will overcome the issues with KUBEADM alterative but its not going as per my expectations [id:image015.png at 01D3629B.1C16B720] Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS Date: Friday, November 10, 2017 at 3:09 PM To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, I have tried with using Dashboard host as admin node as you mentioned (velum host) , after doing everything I got the same error. I think this could be problem with multi master. I did another test with single master and it has the same error. Not sure likely where this error but I did everything correct based on your suggestion. [id:image016.png at 01D3629B.1C16B720] Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS Date: Friday, November 10, 2017 at 11:59 AM To: Rob de Canha-Knight Cc: Simon Briggs , Vincent Untz Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Ok got it . Because of multi master I do require round robin which either the admin node or something with Laos balancer . Let me try this fix by rebuilt with multi master and if it fails then I will try with single master . Keep you posted. Have a nice weekend . Best Regards, Rushi. Success is not a matter of being the best & winning the race. Success is a matter of handling the worst & finishing the race Sent from my iPhone please excuse typos and brevity On Nov 10, 2017, at 11:30, Rob de Canha-Knight > wrote: In the k8s external fqdn that must be a load balancer set up externally from the cluster if doing multi-master. The external dashboard fqdn must be the value of the fqdn that velum is running on the admin node. If your admin node is lvsusekub1 then put that in there. Doing multi-master on bare metal requires a loadbalancer and it?s that loadbalancer address that goes in the top box. If you don?t have a loadbalancer then you can put in any of the master node fqdns and it will work. So put lvsusekub3 in the top box and lvsusekub1 in the bottom box and you can do round robin DNS on your dns server. It?s worth noting that once you enter those values they are fixed and to change them you have to rebuild the cluster from scratch. If this is a development environment I recommend using a single master node and putting that value in the top box and the admin node fqdn in the bottom box. Start simple and build up from there. I?m signing off now for the weekend. Have a good weekend. ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" > Date: Friday, 10 November 2017 at 19:24 To: Rob de Canha-Knight > Cc: Simon Briggs >, Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Ok , I agree some point (two boxes only ..i put two boxes with same hostname ?lvsusekub3? and lvsusekube3.pal.sap.corp). I setup with 3 masters as I mentioned before and this host LVSUSEKUB3 is the first master node hostname . I did make sure everything right except FQDN Question>: what is second box I should put hostname My admin node: lvsusekub1 Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Friday, November 10, 2017 at 11:19 AM To: Rushi NS > Cc: Simon Briggs >, Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 I've identified your problem. The first box is the k8s API endpoint. This field has to be set to the kubernetes master fqdn. I think you have it set to your admin node fqdn and that?s why things are not working. You?ll have to destroy your cluster and make sure that the top field in your screenshot has the fqdn of the k8s master node not the admin node (those two boxes must have different addresses in) ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" > Date: Friday, 10 November 2017 at 19:17 To: Rob de Canha-Knight > Cc: Simon Briggs >, Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Answer to your queries. You must make sure that you are accessing velum from the right FQDN ? the one you gave velum during the setup process when it asks for the internal and external dashboard FQDN. I set this during API FQDN I did make sure no plugin blocks (java sript) Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Friday, November 10, 2017 at 11:13 AM To: Rushi NS > Cc: Vincent Untz >, Simon Briggs > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 You must make sure that you are accessing velum from the right FQDN ? the one you gave velum during the setup process when it asks for the internal and external dashboard FQDN. Aside from that make sure you?ve not got any browser plugins that are blocking scripts or javascript from running. If you still cannot get it to work then you will have to wait for the 2.0 final release next week and try that. If you run into issues there I cannot help as it doesn?t fall into my role and you?ll have to use the official channels for support. ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" > Date: Friday, 10 November 2017 at 19:09 To: Rob de Canha-Knight > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Thanks. I did the setup with 3 master and 1 minions and its worked nicely but while downloading kubectl file the authentication I set during velum setup is not accepted and I get error downloading the kubectl file > Also I got the error you stated (not being able to talk to the velum API. When this happens please refresh your browser page and accept the new certificate.) I refresh but I didn?t get any where accept new certificate but all worked. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Friday, November 10, 2017 at 10:32 AM To: Rushi NS > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 It supports multi master and yes; your precious mail is correct. Sent from my iPhone - please excuse any shortness On 10 Nov 2017, at 18:29, Ns, Rushi > wrote: Hi Rob, Is this release supports multi master (controllers ? etcd) or single master. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rushi NS > Date: Friday, November 10, 2017 at 10:17 AM To: Rob de Canha-Knight > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Hi Rob, Perfect and Thanks, I just downloaded and will start deploying and keep you posted. As I understand 2.0 is removed the caasp-cli authentication ? and everything should work as it was before with 1.0 using kubeconfig file downloaded from VELUM web. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Friday, November 10, 2017 at 10:01 AM To: Rushi NS > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 November 16th However; you can download our latest release candidate ISO from https://drive.google.com/file/d/1ZO0sduyV5GS3WThl0eLVjnMNHCaFIi5u/view?usp=sharing which doesn?t require you to use caasp-cli. One note; during the bootstrap process you will get an error at the top about not being able to talk to the velum API. When this happens please refresh your browser page and accept the new certificate. Once you have done this it will be able to talk to the API and you?re good to go. To obtain the kubeconfig file you click the button and this will redirect you to a new login page where you enter in your caas platform admin account credentials and it will offer your browser a download of the kubeconfig that has the correct client certificate in it. Many thanks, Rob ----- Rob de Canha-Knight EMEA Platform and Management Technical Strategist SUSE rob.decanha-knight at suse.com (P) +44 (0) 1635 937689 (M) +44 (0) 7392 087303 (TW) rssfed23 ---- From: "Ns, Rushi" > Date: Friday, 10 November 2017 at 17:58 To: Rob de Canha-Knight > Cc: Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 HI Rob, What is the ETA for 2.0 release ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Rob de Canha-Knight > Date: Tuesday, November 7, 2017 at 2:32 PM To: Rushi NS > Cc: Carsten Duch >, Johannes Grassler >, Michal Jura >, Nicolas Bock >, Simon Briggs >, Vincent Untz > Subject: Re: KUBEADM method install kubernetes clusters on SUSE 12 SP1/SP2 Thanks Rushi - yes sticking with CaaSP will make your life much easier and enable you to get support as well once a suitable support contract/agreement is in place. When 2.0 is released we will have an updated user manual and deployment guide in the usual place (https://www.suse.com/documentation/suse-caasp/index.html) for you to consume so don?t worry you won?t get in any trouble :) Rob Sent from my iPhone - please excuse any shortness On 7 Nov 2017, at 23:27, Ns, Rushi > wrote: Hi Rob, Thank you. Yes, I am sticking to ?CAASP? only , since had issues with authorization I wanted to try out with kubeadm to setup a cluster for our DMZ internet facing for federation. KUBEADM is working but its pain as CAASP works nice with everything based on PXE which is what I would like to have in my future builds. If you say the 2.0 is coming out next, then I will wait . please provide the doucemntation how you consume 2.0 , so that I don?t get any trouble. Thank you so much for your quick reply. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 11/7/17, 2:22 PM, "Rob de Canha-Knight" > wrote: Hi Rushi. As mentioned on the thread I just sent you; the method Simon is referring to there is the manual upstream way to deploy Kubernetes. It is separate and very different from CaaSP and is completely unsupported in every way. As such; we cannot help you here with the kubeadm way in any way shape or form. Please stick with CaaSP for now if you can or want assistance from us. The version that doesn?t require you to use caasp-cli will be released by the end of next week (2.0 final) and you will be able to deploy that successfully and if you run into any issues we can help you. As a side note I kindly request that you use the CaaSP-beta mailing list for your queries as you did in the past or log a support ticket when you run into issues with the final release. You are likely to get a better response faster than emailing our product team directly plus the knowledge will be archived publicly for everyone else to benefit. Many thanks, Rob Sent from my iPhone - please excuse any shortness On 7 Nov 2017, at 23:13, Ns, Rushi > wrote: Hello Simon, How are you . Long time. I have some Question. Not sure if you can answer. As you know we are doing test of ?CAASP? from SUSE , however it is bit pain as CAASP-CLI authentication is boiling down the cluster without access. Rob is aware what I was talking. Since the CAASP is still issue with CAASP-CLI , I was thinking if SLES12 SP1 can work with KUBEADM method to install cluster. Did anyone tried from your side. I found this link but not sure https://forums.suse.com/archive/index.php/t-9637.html. Do you know who is ?simon (smflood)? is that you :( on the above link , he said he did install with KUBEADM using SLES 12 SP1 and SP2 where he has given images links to https://software.opensuse.org/download.html?project=Virtualization%3Acontainers&package=kubernetes can someone help me if KUBEADM method to insetall kubernetes cluster on SUSE 12 SP1/Sp2. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 3/14/17, 2:26 AM, "Simon Briggs" > wrote: Hi Rushi, I am part of the team delivering our Expert Day in Rome today so cannot make a call, but I want to make sure things are progressing for you. Please advise if Michal's advise worked or if you have new challenges we can help with. Thanks Simon Briggs On 10/03/17 09:10, Simon Briggs wrote: Hi Rushi, AJ has answered the CaaSP question. Bit I can help explain that SOC7 is now fully GA and can be downloaded freely from the https://www.suse.com/download-linux/ Cloud click through. Thanks Simon On 09/03/17 21:54, Ns, Rushi wrote: Hi Michaal, Any update on this. I am eagerly waiting for the change as I wil start the setup again when SOC7 GA comes out. @Vincent: Do you know when SOC7 GA comes out . Also CaaS Beta ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/23/17, 7:14 AM, "Ns, Rushi" > wrote: Hi Michal, Good to hear that it's doable, yes please test at your end and let me know. I will wait for your confirmation and procedure how to consume our designated SDN vlan. Best Regards, Rushi. Success is not a matter of being the best & winning the race. Success is a matter of handling the worst & finishing the race Sent from my iPhone please excuse typos and brevity On Feb 23, 2017, at 03:04, Michal Jura > wrote: Hi Rushi, It should be possible to use VLAN ID 852 for Magnum private network. You should configure network with name private in advance with vlan ID 852, but I have to test it first. Changing subnet to 192.168.x.x should be durable too, but I have to check it. Please give me some time and I will come back to you. Best regards, Michal On 02/22/2017 11:01 PM, Ns, Rushi wrote: Hi Carsten,. Thank you. As you know we have VLAN ID *852* as SDN in network.json which is already in our switch level. Here I have question or suggestion. Can I use this VLAN 852 for Magnum side as L2 traffic ? we do not want to use 10.x.x.x IP space, so we use non-routable 192.168.x.x kind of IP space which will route through our 852 VLAN . Is it possible to define this in Heat Template, so that cluster deployment will generate 192.168.x.x subnet instead of 10.x.x.x subnet when a kubernetes cluster created? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE *From: *Carsten Duch > *Date: *Wednesday, February 22, 2017 at 10:21 AM *To: *"Ns, Rushi" >, Johannes Grassler >, Michal Jura >, Vincent Untz > *Cc: *Nicolas Bock >, Simon Briggs > *Subject: *AW: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, Theater Problem is that you have configured it to use the vlans from 222 to 2222. You have to choose a range which is allowed on the Trunk port and not already in use. If you want to change the starting Point you have to redeploy the whole cloud and provide the correct vlan id when editing the network.json. So without that, you are only able to change the max number up to a value you are able to use. Maybe 50 for 222 to 272. Or try vxlan instead of vlan again. But I think that the overall problem is a misconfigured switch. Make sure that all vlan ids are allowed for the Trunk and you will have a good chance that it works. Von meinem Samsung Galaxy Smartphone gesendet. -------- Urspr?ngliche Nachricht -------- Von: "Ns, Rushi" > Datum: 22.02.17 19:04 (GMT+01:00) An: Carsten Duch >, Johannes Grassler >, Michal Jura >, Vincent Untz > Cc: Nicolas Bock >, Simon Briggs > Betreff: Re: Weekly review of SAP Big Data SOC 7 testing HI Carsten Yes I am aware as we discussed this during our call and after reading your response, however the vlan 222-322 is already used in our production particularly 271 is our Laptop VLAN (All employees get IP address of the Laptops ) which we cannot use it for this. I am looking for alternatives. Let me know if you have any idea other than this 222-322 allow ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/21/17, 10:38 PM, "Carsten Duch" > wrote: Hi Rushi, have you tried to configure your switch according to my email from 14th? Maybe you didn't got the mail? I suggested the following configuration on the switch: Your are using linuxbridge with vlan. Make sure to allow tagging of VLANs on the switch and add the range to the allowed VLANs for the TRUNK. The range is defined by your fixed vlan and the maximum number of VLANs. starting point: fixed VLAN id = 222 + Maximum Number of VLANs configured in the Neutron Barclamp= 2000 That means that you have to allow a range from 222 to 2222 on your switch side. But I would recommend to reduce the Maximum so that it will not overlap with other existing VLANs. You can reduce it to 100 or something lower and then allow a range from 222 to 322 for the TRUNK Port. You don't need to create all the VLANs manually but you need to allow VLAN tagging for the Port and allow a range. Depending on your switch,the configuration should look something like: switchport trunk allow vlan 222-322 http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus5000/sw/configuration/guide/cli/CLIConfigurationGuide/AccessTrunk.html Make sure to allow all the VLANs from your network.json for the TRUNK Port. On 21.02.2017 23:40, Ns, Rushi wrote: Hi Michal, Yes, that?s obviously the root cause I found before your email but it is cumbersome to understand the flow of the segmentation ID which I need to discuss how we can overcome. What I observe is, every time I create new cluster the private network generates a new segment ID:: 271, 272, 273 like that?(this is like VLAN) which our floating VLAN should be able to reach only when we add this segment ID (dummy ID 231,232 or whatever generates) to our swith level as real VLAN otherwise the private network subnet cannot reach to floating IP . check attached picture contains the information of segmentation ID: I remember I had one session with one of your SuSE person (carsten.duch at suse.com) recently I shared my system screen and we discussed this network segment issue (Software Defined Networ) and he answered some of that , however it appeared its beyond is knowledge. I have CC?d Carsten here., so you can talk to him. Do you have any idea what needs to be done on the physical network swtich level where the VLANs already connected but not this VLAN (271, 272,whatever) because this is really not easy to allow in real network switch configuration of the VLAN to allow this trunked port which doesn?t exist at all. We had the same issue before in deploying cloud foundry on top of openstack and we fool the switch with the private segment ID created and at the end we found this is a bug in openstack SDN side. Let me know what needs to be done and I can do that. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/21/17, 4:27 AM, "Michal Jura" > wrote: Hi, This problem looks like there is no connection from private network where kube-master and kube-mionions are launched to Heat PublicURL endpoint. Please fix network configuration. On 02/20/2017 08:10 PM, Johannes Grassler wrote: Hello Rushi, alright, so we are creating a cluster now but the Kubernetes master fails to signal success to the Heat API (that's what WaitConditionTimeout means). Unfortunately this is where debugging becomes fairly hard...can you ssh to the cluster's Kubernetes master and get me /var/log/cloud-init.log and /var/log/cloud-init-output.log please? Maybe we are lucky and find the cause of the problem in these logs. If there's nothing useful in there I'll probably have to come up with some debugging instrumentation next... Cheers, Johannes On 02/20/2017 07:53 PM, Ns, Rushi wrote: Hi Johannes, Thanks, I just tried with the changes you mentioned and I see that it made some progress this time (creating private network subnet, heat stack and instance as well cluster ) , however after some time it failed with ? CREATE_FAILED? status. Here is the log incase if you want to dig in more. ================ 2017-02-20 09:01:18.148 92552 INFO oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] Reconnected to AMQP server on 10.48.220.40:5672 via [amqp] clientwith port 36265. 2017-02-20 10:36:25.914 92552 INFO magnum.conductor.handlers.cluster_conductor [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] The stack None was not found during cluster deletion. 2017-02-20 10:36:26.515 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] Deleting certificate e426103d-0ecf-4044-9383-63305c667a c2 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 10:36:26.517 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-dddc4477-407f-4b84-afbd-f8b657fd02c6 admin openstack - - -] Deleting certificate a9a20d33-7b54-4393-8385-85c4900a0f 79 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 10:37:39.905 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-d50c84af-7eca-4f76-8e2b-dc49933d0376 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 10:37:40.049 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-d50c84af-7eca-4f76-8e2b-dc49933d0376 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 10:48:48.172 92552 ERROR magnum.conductor.handlers.cluster_conductor [req-ac20eb45-8ba9-4b73-a771-326122e94ad7 522958fb-fd7c-4c33-84d2-1ae9e60c1574 - - - -] Cluster error, stack status: CREATE_ FAILED, stack_id: e47d528d-f0e7-4a40-a0d3-12501cf5a984, reason: Resource CREATE failed: WaitConditionTimeout: resources.kube_masters.resources[0].resources.master_wait_condition: 0 of 1 received 2017-02-20 10:48:48.510 92552 INFO magnum.service.periodic [req-ac20eb45-8ba9-4b73-a771-326122e94ad7 522958fb-fd7c-4c33-84d2-1ae9e60c1574 - - - -] Sync up cluster with id 15 from CREATE_IN_PROGRESS to CRE ATE_FAILED. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 9:51 AM, "Johannes Grassler" > wrote: Hello Rushi, I took a closer look at the SUSE driver and `--discovery-url none` will definitely take care of any etcd problems. The thing I'm not quite so sure about is the registry bit. Can you please try the following... magnum cluster-template-create --name k8s_template\ --image-id sles-openstack-magnum-kubernetes \ --keypair-id default \ --external-network-id floating \ --dns-nameserver 8.8.8.8 \ --flavor-id m1.magnum \ --master-flavor-id m1.magnum \ --docker-volume-size 5 \ --network-driver flannel \ --coe kubernetes \ --floating-ip-enabled \ --tls-disabled \ --http-proxy http://proxy.pal.sap.corp:8080 magnum cluster-create --name k8s_cluster \ --cluster-template k8s_template \ --master-count 1 \ --node-count 2 \ --discovery-url none ...and see if that yields a working cluster for you? It still won't work in a completely disconnected environment, but with the proxy you have in place it should work. Some explanation: the --discovery-url none will disable the validation check that causes the GetDiscoveryUrlFailed error, allowing Magnum to instantiate the Heat template making up the cluster. The --http-proxy http://proxy.pal.sap.corp:8080 will then cause the cluster to try and access the Docker registry through the proxy. As far as I understand our driver, the --registry-enabled --labels registry_url=URL will require you to set up a local docker registry in a network reachable from the Magnum bay's instances and specify a URL pointing to that docker registry. I'd rather not ask you to do that if access through the proxy turns out to work. Cheers, Johannes On 02/20/2017 04:23 PM, Ns, Rushi wrote: Hi Johannes, I have also added https_proxy parameter thought it might need both (http and https) but even that failed too. I see the log expected to have discovery etcd. magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --http-proxy http://proxy.pal.sap.corp:8080 --https-proxy http://proxy.pal.sap.corp:8080 magnum-conductor.log ===================== 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 07:17:50.390 92552 ERROR oslo_messaging.rpc.server Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 7:16 AM, "Ns, Rushi" > wrote: Hello Johannes, No luck even after adding the internet proxy at the time of cluster template creation and without specify anything at the cluster-create . The cluster create failed and this time I don?t see anything like , no heat stack created, no private kubernetes network subnet created and many. Here are the commands I tried. Let me know if this is how supposed to be used or am I doing something wrong. magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --http-proxy http://proxy.pal.sap.corp:8080 magnum cluster-create --name k8s_cluster --cluster-template k8s_template --master-count 1 --node-count 2 this is the magnum-conductor.log I see something more needed . 2017-02-20 06:55:27.245 92552 ERROR magnum.drivers.common.template_def [-] HTTPSConnectionPool(host='discovery.etcd.io', port=443): Max retries exceeded with url: /new?size=1 (Caused by NewConnectionError (': Failed to establish a new connection: [Errno 113] EHOSTUNREACH',)) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server [-] Exception during message handling 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 165, in cluster_create 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server create_timeout) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 97, in _create_stack 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server _extract_template_definition(context, cluster)) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 82, in _extract_template_definition 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server scale_manager=scale_manager) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 337, in extract_definition 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server self.get_params(context, cluster_template, cluster, **kwargs), 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/template_def.py", line 50, in get_params 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server extra_params['discovery_url'] = self.get_discovery_url(cluster) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 06:55:27.304 92552 ERROR oslo_messaging.rpc.server 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server [-] Can not acknowledge message. Skip processing 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 126, in _process_incoming 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server message.acknowledge() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 119, in acknowledge 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.message.acknowledge() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 251, in acknowledge 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self._raw_message.ack() 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.channel.basic_ack(self.delivery_tag) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self._send_method((60, 80), args) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server self.channel_id, method_sig, args, content, 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server write_frame(1, channel, payload) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 188, in write_frame 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server frame_type, channel, size, payload, 0xce, 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 385, in sendall 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server tail = self.send(data, flags) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 379, in send 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server return self._send_loop(self.fd.send, data, flags) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 366, in _send_loop 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server return send_method(data, *args) 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server error: [Errno 104] Connection reset by peer 2017-02-20 06:55:27.309 92552 ERROR oslo_messaging.rpc.server 2017-02-20 06:55:27.310 92552 ERROR oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] AMQP server on 10.48.220.40:5672 is unreachable: . Trying again in 1 seconds. Client port: 50462 2017-02-20 06:55:28.347 92552 INFO oslo.messaging._drivers.impl_rabbit [-] [6c39b368-bbdf-40cd-b1a5-b14da062f692] Reconnected to AMQP server on 10.48.220.40:5672 via [amqp] clientwith port 58264. 2017-02-20 06:59:09.827 92552 INFO magnum.conductor.handlers.cluster_conductor [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] The stack None was not found during cluster deletion. 2017-02-20 06:59:10.400 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] Deleting certificate 105d39e9-ca2a-497c-b951-df87df2a02 24 from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 06:59:10.402 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-9b6be3b8-d2fd-4e34-9d08-33d66a270fb1 admin openstack - - -] Deleting certificate f0004b69-3634-4af9-9fec-d3fdba074f 4c from the local filesystem. CertManager type 'local' should be used for testing purpose. 2017-02-20 07:02:37.658 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 07:02:37.819 92552 WARNING magnum.common.cert_manager.local_cert_manager [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Storing certificate data on the local filesystem. CertM anager type 'local' should be used for testing purpose. 2017-02-20 07:02:40.026 92552 ERROR magnum.drivers.common.template_def [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] HTTPSConnectionPool(host='discovery.etcd.io', port=443): Max retries exceeded with url: /new?size=1 (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 113] EH OSTUNREACH',)) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server [req-0d9720f7-eb3e-4c9f-870b-e26feb26b9e2 admin openstack - - -] Exception during message handling 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 165, in cluster_create 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server create_timeout) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 97, in _create_stack 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server _extract_template_definition(context, cluster)) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 82, in _extract_template_definition 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server scale_manager=scale_manager) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 337, in extract_definition 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server self.get_params(context, cluster_template, cluster, **kwargs), 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/template_def.py", line 50, in get_params 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server extra_params['discovery_url'] = self.get_discovery_url(cluster) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. 2017-02-20 07:02:40.064 92552 ERROR oslo_messaging.rpc.server Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/20/17, 12:41 AM, "Johannes Grassler" > wrote: Hello Rushi, On 02/20/2017 12:26 AM, Ns, Rushi wrote: Hi Johannes/Vincent Thank you to both for the detailed. I did those steps as per the link https://www.suse.com/documentation/suse-openstack-cloud-7/book_cloud_suppl/data/sec_deploy_kubernetes_without.html you provided before executing the cluster as I learned this in the document , however I am sure I did something wrong as ii don?t know what public etcd discovery url since I don?t have anything setup on my end. Here are the command I used and if you see I specified that parameter as you suggested but only as ?URL? without knowing the real value of ?URL? (--labels registry_url=URL) , so this is my mistake or how it should be used ? I am not sure, but I followed your document ? ---------------------------------- 1) magnum cluster-template-create --name k8s_template --image-id sles-openstack-magnum-kubernetes --keypair-id default --external-network-id floating --dns-nameserver 8.8.8.8 --flavor-id m1.magnum --master-flavor-id m1.magnum --docker-volume-size 5 --network-driver flannel --coe kubernetes --floating-ip-enabled --tls-disabled --registry-enabled --labels insecure_registry_url=URL 2) magnum cluster-create --name k8s_cluster --cluster-template k8s_template --master-count 1 --node-count 2 --discovery-url none ----------------------------------- Now I would like to understand where and how I can setup my own local etcd discovery service ? is it required. As far as I know etcd it is. I may be wrong though. Luckily there is another solution: Also our internet access is through proxy port (http://proxy.pal.sap.corp:8080) so if you can guide how to do that setup, I can do or tell me the URL value to specified and I can try. Just add an `--http-proxy http://proxy.pal.sap.corp:8080` <%20%20> when creating the cluster template and do NOT provide any discovery URL options for either the cluster template or the cluster itself. Provided the proxy doesn't require authentication this should do the trick... Cheers, Johannes Also I wanted to inform that, we had issue Horizon (public and admin page IP is not hand shake) with BETA 8 Neutron going with VLAN open switch, Nicolas and I had some sessions towards and Nicolas suggested to use ?LinuxBridge instead openvswith? since the patch he has may not be in the BETA8 that I download. . you can check with Nicolas on this as our current BEtA8 seems not good with VLAN/openvswitch. At any cost, I will remove this cluster and rebuild it soon but I wail wait until the full GA build comes out instead of BETA 8 or I can try if you can think the latest BETA 8 will not have issues overall. Please suggest and provide me the help for above value ?labels insecure_registry_url=URL? or how to setup local etc discovery service ? Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/17/17, 1:14 AM, "Vincent Untz" > wrote: Hi, Le vendredi 17 f?vrier 2017, ? 10:02 +0100, Johannes Grassler a ?crit : Hello Rushi, sorry, this took me a while to figure out. This is not the issue I initially thought it was. Rather it appears to be related to your local networking setup and/or the cluster template you used. This is the crucial log excerpt: | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/drivers/common/template_def.py", line 445, in get_discovery_url | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server discovery_endpoint=discovery_endpoint) | 2017-02-05 21:32:52.915 92552 ERROR oslo_messaging.rpc.server GetDiscoveryUrlFailed: Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. Magnum uses etcd to orchestrate its clusters' instances. To that end it requires a discovery URL where cluster members announce their presence. By default Magnum uses the public etcd discovery URL https://discovery.etcd.io/new?size=%(size)d This will not work in an environment without Internet access which I presume yours is. The solution to this problem is to set up a local etcd discovery service and configure its URL template in magnum.conf: [cluster] etcd_discovery_service_endpoint_format = https://my.discovery.service.local/new?size=%(size)d Ah, this use case is in our doc. Rushi, can you follow what's documented at: https://www.suse.com/documentation/suse-openstack-cloud-7/book_cloud_suppl/data/sec_deploy_kubernetes_without.html Vincent Cheers, Johannes On 02/16/2017 05:03 AM, Ns, Rushi wrote: Hi Simon. Some reason the mail I sent this morning didn?t go, also did?t bounced back but I found it was stuck in my drafts. Anyways, sorry about the delay . Here you go again. Please find attached files of magnum as requested. Please find below output of other commands result. ------ root at d38-ea-a7-93-e6-64:/var/log # openstack user list +----------------------------------+---------------------+ | ID | Name | +----------------------------------+---------------------+ | d6a6e5c279734387ae2458ee361122eb | admin | | 7cd6e90b024e4775a772449f3aa135d9 | crowbar | | ea68b8bd8e0e4ac3a5f89a4e464b6054 | glance | | c051a197ba644a25b85e9f41064941f6 | cinder | | 374f9b824b9d43d5a7d2cf37505048f0 | neutron | | 062175d609ec428e876ee8f6e0f39ad3 | nova | | f6700a7f9d794819ab8fa9a07997c945 | heat | | dd22c62394754d95a8feccd44c1e2857 | heat_domain_admin | | 9822f3570b004cdca8b360c2f6d4e07b | aodh | | ac06fd30044e427793f7001c72f92096 | ceilometer | | d694b84921b04f168445ee8fcb9432b7 | magnum_domain_admin | | bf8783f04b7a49e2adee33f792ae1cfb | magnum | | 2289a8f179f546239fe337b5d5df48c9 | sahara | | 369724973150486ba1d7da619da2d879 | barbican | | 71dcd06b2e464491ad1cfb3f249a2625 | manila | | e33a098e55c941e7a568305458e2f8fa | trove | +----------------------------------+---------------------+ root at d38-ea-a7-93-e6-64:/var/log # openstack domain list +----------------------------------+---------+---------+-------------------------------------------+ | ID | Name | Enabled | Description | +----------------------------------+---------+---------+-------------------------------------------+ | default | Default | True | The default domain | | f916a54a4c0b4a96954bad9f9b797cf3 | heat | True | Owns users and projects created by heat | | 51557fee0408442f8aacc86e9f8140c6 | magnum | True | Owns users and projects created by magnum | +----------------------------------+---------+---------+-------------------------------------------+ root at d38-ea-a7-93-e6-64:/var/log # openstack role assignment list +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ | Role | User | Group | Project | Domain | Inherited | +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ | 6c56316ecd36417184629f78fde5694c | d6a6e5c279734387ae2458ee361122eb | | 6d704aa281874622b02a4e24954ede18 | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 7cd6e90b024e4775a772449f3aa135d9 | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 6c56316ecd36417184629f78fde5694c | d6a6e5c279734387ae2458ee361122eb | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 932db80652074571ba1b98738c5af598 | 7cd6e90b024e4775a772449f3aa135d9 | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | ea68b8bd8e0e4ac3a5f89a4e464b6054 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | ea68b8bd8e0e4ac3a5f89a4e464b6054 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | c051a197ba644a25b85e9f41064941f6 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | c051a197ba644a25b85e9f41064941f6 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 374f9b824b9d43d5a7d2cf37505048f0 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 374f9b824b9d43d5a7d2cf37505048f0 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 062175d609ec428e876ee8f6e0f39ad3 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 062175d609ec428e876ee8f6e0f39ad3 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | f6700a7f9d794819ab8fa9a07997c945 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | f6700a7f9d794819ab8fa9a07997c945 | | 19c2c03e858b47da83eda020aa83639e | | False | | 932db80652074571ba1b98738c5af598 | d6a6e5c279734387ae2458ee361122eb | | 7a18242f8e1c4dd9b42d31facb79493f | | False | | 6c56316ecd36417184629f78fde5694c | dd22c62394754d95a8feccd44c1e2857 | | | f916a54a4c0b4a96954bad9f9b797cf3 | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 9822f3570b004cdca8b360c2f6d4e07b | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 9822f3570b004cdca8b360c2f6d4e07b | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | ac06fd30044e427793f7001c72f92096 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | ac06fd30044e427793f7001c72f92096 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | d694b84921b04f168445ee8fcb9432b7 | | | 51557fee0408442f8aacc86e9f8140c6 | False | | 9fe2ff9ee4384b1894a90878d3e92bab | bf8783f04b7a49e2adee33f792ae1cfb | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | bf8783f04b7a49e2adee33f792ae1cfb | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 2289a8f179f546239fe337b5d5df48c9 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 2289a8f179f546239fe337b5d5df48c9 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 369724973150486ba1d7da619da2d879 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 369724973150486ba1d7da619da2d879 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | 71dcd06b2e464491ad1cfb3f249a2625 | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | 71dcd06b2e464491ad1cfb3f249a2625 | | 19c2c03e858b47da83eda020aa83639e | | False | | 9fe2ff9ee4384b1894a90878d3e92bab | e33a098e55c941e7a568305458e2f8fa | | 19c2c03e858b47da83eda020aa83639e | | False | | 6c56316ecd36417184629f78fde5694c | e33a098e55c941e7a568305458e2f8fa | | 19c2c03e858b47da83eda020aa83639e | | False | +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+-----------+ Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/15/17, 11:01 AM, "Ns, Rushi" > wrote: Hi Simon, I am sorry, I got stuck. Sure I will send the logs now . Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/15/17, 10:26 AM, "Simon Briggs" > wrote: Hi Rushi, I assume you where unable to join our call. Would it be possible to collect the logs that we request, as this is the only way my teams can help you remotely. Regards Simon Briggs On 15/02/17 08:58, Johannes Grassler wrote: Hello Rushi, ok. Can you please supply 1) A supportconfig tarball: this will have the contents of both /etc/magnum/magnum.conf.d/ and magnum-conductor.log which should allow me to figure out what is wrong. 2) The output of `openstack user list`, `openstack domain list`, `openstack role assignment list` (all run as the admin user). With that information I should be able to figure out whether your problem is the one I mentioned earlier. Cheers, Johannes On 02/14/2017 04:42 PM, Ns, Rushi wrote: Hello Johannes, Thank you for the information. FYI, my setup is not on HA . Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 2/14/17, 12:43 AM, "Johannes Grassler" > wrote: Hello Rushi, if the problem is the | Creating cluster failed for the following reason(s): Failed to create trust Error ID: c7a27e1f-6a6a-452e-8d29-a38dbaa3fd78, Failed to create trust Error ID: a9f328cc-05e8-4c87-9876-7db5365812f2 error you mentioned below, the problem is likely to be with Magnum rather than with Heat. Magnum creates a Keystone trust for each cluster that the cluster's VMs use to talk to the Magnum API among others. We had a spell of trouble[0] with that recently and you may be running into the same problem, especially if you are running an HA setup. Are you? If so, check if all files in /etc/magnum/magnum.conf.d/ match across all controller nodes. If there are differences, especially in the [trust] section you are probably affected by the same issue we ran into recently. Cheers, Johannes [0] https://github.com/crowbar/crowbar-openstack/pull/843 On 02/14/2017 09:01 AM, Simon Briggs wrote: Hi Rushi, You advise that you still have an issue. Would this still be the same as the one that Vincent helped with below? I have added Johannes to those CC'd as he is skilled in debugging that type of error. Thanks Simon Sent from my Samsung device -------- Original message -------- From: Vincent Untz > Date: 06/02/2017 12:39 (GMT+02:00) To: Rushi Ns > Cc: Michal Jura >, Nicolas Bock >, Simon Briggs > Subject: Re: Weekly review of SAP Big Data SOC 7 testing Rushi, About the "Failed to create trust": can you check the heat logs? My guess is that the error comes from there and more context about what's happening around that error would probably be useful. Thanks, Vincent Le lundi 06 f?vrier 2017, ? 04:01 +0000, Ns, Rushi a ?crit : Hi Simon Thank you. Please try if Michal can give some information about the image of kubernetes and how to consume. To me I have full knowledge of kubernetes since from long time also we are in production kubernetes in Germany for many projects which I did. Anyways, please try to get Michal for 1 or 2 hours discussion so that I get idea also please help to find the image from the link provided is not available at this time. http://download.suse.de/ibs/Devel:/Docker:/Images:/SLE12SP2-JeOS-k8s-magnum/images/sles-openstack-magnum-kubernetes.x86_64.qcow2 @Michal: Would you be kind to help me to get the Kuberentes image as bove link is not working Regards to SAHARA, I made progress of upload image (mirantis prepared images of SAHARA Hadoop) and created the necessary configuration (cluster templates, node templates and everything) and at the final creating a cluster from template erord with the following. , so I really need someone from your team having SAHARA knowledge would help to get the issue fixed. here is the error while creating cluster. Creating cluster failed for the following reason(s): Failed to create trust Error ID: c7a27e1f-6a6a-452e-8d29-a38dbaa3fd78, Failed to create trust Error ID: a9f328cc-05e8-4c87-9876-7db5365812f2 [cid:image001.png at 01D27FEA.9C6BC8F0] Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs > Date: Saturday, February 4, 2017 at 1:57 AM To: "Ns, Rushi" > Cc: Michal Jura >, Nicolas Bock >, Vincent Untz > Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, Thanks for the update and I'm glad we are moving forward. We'll done everyone. Michal is indeed an expert around these services, though I am aware he is presently on a sprint team mid cycle so he may find it difficult to do his required workload and deal with external work as well. So please be patient if it takes a small amount of time for him to respond Thanks Simon Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" > Date: 04/02/2017 02:19 (GMT+00:00) To: Simon Briggs > Cc: Michal Jura >, Nicolas Bock > Subject: Re: Weekly review of SAP Big Data SOC 7 testing HI Simon, Just to give you update. The Horizon issue was resolved changing the Nuetron from OPENVSWITCH to LinuxBridge as mentioned by Nick. Now I need to move forward for SAHARA which I can try, but if I run into issues, I might need some expertise who will be having SAHARA knowledge from your team. Regards to other request Magnum (kubernetes) I would like to discuss with Michal Jura (mjura at suse.com), I have Cc?d here as I was going through his github document https://github.com/mjura/kubernetes-demo but wasn?t able to find the image as he specified Link to the image http://download.suse.de/ibs/Devel:/Docker:/Images:/SLE12SP2-JeOS-k8s-magnum/images/sles-openstack-magnum-kubernetes.x86_64.qcow2 Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs > Date: Friday, February 3, 2017 at 10:38 AM To: "Ns, Rushi" > Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi, Sorry about delaying you. I will coordinate with Nick to get the best resource for you. Thanks Simon Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" > Date: 03/02/2017 18:33 (GMT+00:00) To: Simon Briggs > Subject: Re: Weekly review of SAP Big Data SOC 7 testing Hi Simon, Thank you, I waited on the call, however the toll free number is not US number which call never went through(the Toll free seems UK ), but I stayed on GOtoMEETiNG for 15 mins and disconnected. Sure, I will sync up with Nick and yes you are right it seems not aa code issue, however we are not sure which I will check with Nick in about 1 hour . Keep you posted. Also I need help on Magnum (kubernetes side as well) I see a person Michal Jura (mjura at suse.com) I spoke with Nick to bring Michal on another call to start the Magnum stuff. Can you try to arrange Michal to be with me next week for a short call after this Horizon issue fixed and SAHARA works only after I will work with Michal Jura. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Simon Briggs > Date: Friday, February 3, 2017 at 10:28 AM To: "Ns, Rushi" > Subject: Re: Accepted: Weekly review of SAP Big Data SOC 7 testing Hi Rushi, I'm afraid because I'm used to finishing at dinner on Fridays and so it slipped my mind that we had a 6pm arranged. Sorry. I am available now to talk if you want, though I have spoken to Nick and he advised he has tested your Horizon setup and it works OK on his replica environment of what you have. With this situation we can only work with the premises that the Horizon issue is not a code problem but is local to your configuration. He did say he was going to try and help you today on this matter. Did this help? Kind regards Simon Briggs Sent from my Samsung device -------- Original message -------- From: "Ns, Rushi" > Date: 02/02/2017 14:22 (GMT+00:00) To: Simon Briggs > Subject: Accepted: Weekly review of SAP Big Data SOC 7 testing -- Les gens heureux ne sont pas press?s. -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Les gens heureux ne sont pas press?s. -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Johannes Grassler, Cloud Developer SUSE Linux GmbH, HRB 21284 (AG N?rnberg) GF: Felix Imend?rffer, Jane Smithard, Graham Norton Maxfeldstr. 5, 90409 N?rnberg, Germany -- Mit freundlichen Gr??en / Best regards Carsten Duch Sales Engineer SUSE N?rdlicher Zubringer 9-11 40470 D?sseldorf (P)+49 173 5876 707 (H)+49 521 9497 6388 carsten.duch at suse.com -- SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2964 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1200 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 797 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 771 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 763 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 951 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 809 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 2966 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1194 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 799 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 773 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 765 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 953 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 811 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 82376 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 47412 bytes Desc: image016.png URL: From rushi.ns at sap.com Tue Nov 21 09:07:08 2017 From: rushi.ns at sap.com (Ns, Rushi) Date: Tue, 21 Nov 2017 16:07:08 +0000 Subject: [caasp-beta] Antw: kubeconfig download error with DEX internal server error , Login error In-Reply-To: <20171121154427.5c2ktbxlyhisbt3x@freedom> References: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> <5A1409E40200001C0030593B@prv-mh.provo.novell.com> <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> <20171121154427.5c2ktbxlyhisbt3x@freedom> Message-ID: Hi Rafael/Martin Thank you. I have also filed this new bug below since I wasn?t able to access the bug you guys filed as I was getting ?authorization error? Bug 1069251 has been added to the database Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE On 11/21/17, 7:44 AM, "Rafael Fern?ndez L?pez" wrote: Hello Rushi, On Tue, Nov 21, 2017 at 03:29:36PM +0000, Ns, Rushi wrote: > Hi Martin > > Thank you. Yes I did , I used load balancer IP (lvsusekub8.pal.sap.corp) whichi s out of the cluster node address. The host I?ve specified is not velum ip, not master ip and not any of worker ip?s. We are studying the problem at the moment. From what I could see this is what's happening: - When the admin node starts it generates several certificates, based on the information of the machine in the moment of the boot (transient and static hostnames, machine attached ip addresses...). This hostnames get added to some initial certificates in their SAN extensions, as well as the attached IP addresses of the admin node. - When you enter the internal dashboard fqdn (first field on first page of the Velum setup), if you enter an external name not detected by this very first step (e.g. with cloud-init you chose as hostname `admin`, and in this field you write `admin.my.company`), the certificate used by LDAP won't contain `admin.my.company`, whereas Dex will try to connect to the LDAP instance in the admin node using `admin.my.company:389`. This effectively makes the TLS handshake to fail, and Dex is unable to authenticate the user against LDAP. More information will be added to the bug report, and we'll keep you updated. Thank you. -- Cheers, Rafa. From Martin.Weiss at suse.com Wed Nov 22 01:09:15 2017 From: Martin.Weiss at suse.com (Martin Weiss) Date: Wed, 22 Nov 2017 01:09:15 -0700 Subject: [caasp-beta] Antw: kubeconfig download error with DEX internal server error , Login error In-Reply-To: <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> References: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> <5A1409E40200001C0030593B@prv-mh.provo.novell.com> <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> Message-ID: <5A1530AB0200001C00305E1E@prv-mh.provo.novell.com> Hi Rushi, I have learned that the entry in the hosts file is correct as this is pointing to haproxy that should also be running on the admin host. Could you verify if the admin host has haproxy running and verify if there are specific errors in the dashboard.log? Thanks Martin >Hi Martin Thank you. Yes I did , I used load balancer IP (lvsusekub8.pal.sap.corp) whichi s out of the cluster node address. The host I?ve specified is not velum ip, not master ip and not any of worker ip?s. Yes I do have the same entry as you said in my /etc/hosts file (FYI: lvsusekub2.pal.sap.corp is my API server FQDN) Here is my /etc/hosts file #-- start Salt-CaaSP managed hosts - DO NOT MODIFY -- ### service names ### 127.0.0.1 api api.infra.caasp.local lvsusekub2.pal.sap.corp Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Martin Weiss Date: Tuesday, November 21, 2017 at 3:18 AM To: "caasp-beta at lists.suse.com" , Rushi NS Subject: Antw: [caasp-beta] kubeconfig download error with DEX internal server error , Login error Hi Rushi, did you specify a specific external FQDN for the API? Could you check if you have a similar strange entry in the /etc/hosts file on the admin with 127.0.0.1 api ... ? --> this was blocking my velum to contact the API on a master and due to that I could not download the kube-config.. Martin Hello Team, I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: From kukuk at suse.com Wed Nov 22 01:37:34 2017 From: kukuk at suse.com (Thorsten Kukuk) Date: Wed, 22 Nov 2017 09:37:34 +0100 Subject: [caasp-beta] Antw: kubeconfig download error with DEX internal server error , Login error In-Reply-To: <5A1530AB0200001C00305E1E@prv-mh.provo.novell.com> References: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> <5A1409E40200001C0030593B@prv-mh.provo.novell.com> <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> <5A1530AB0200001C00305E1E@prv-mh.provo.novell.com> Message-ID: <20171122083734.GA23961@suse.com> On Wed, Nov 22, Martin Weiss wrote: > Hi Rushi, > > I have learned that the entry in the hosts file is correct as this is pointing > to haproxy that should also be running on the admin host. The line from /etc/hosts is: 127.0.0.1 api api.infra.caasp.local lvsusekub2.pal.sap.corp The lvsusekub2.pal.sap.corp entry in this line is clearly not correct. I don't know if this is one of the reasons for the problems seen here, but this is asking for trouble. But I don't know who adds this entry to 127.0.0.1. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) From aj at suse.com Wed Nov 22 02:50:53 2017 From: aj at suse.com (Andreas Jaeger) Date: Wed, 22 Nov 2017 10:50:53 +0100 Subject: [caasp-beta] CaaS 2 root password. In-Reply-To: <8fc5fe4b-6160-d12e-5353-84145dd5cb04@suse.com> References: <8fc5fe4b-6160-d12e-5353-84145dd5cb04@suse.com> Message-ID: <86c5e6a5-e6d6-d356-880b-083e51dfcb9c@suse.com> On 2017-11-21 17:02, vinicius wrote: > Hello All. > > > I Have two questions about CaaS: > > 1) How can I change the root password ? Which one? > 2) Ex. I have 2 CaaS structures individually? with the same logical > configuration and physical configuration like 3 masters and 3 workers > for infra A and the same configuration for infra B , If I want remove > one worker node from A and re-insert this node in infra B structure > without reinstall the node is possible? Not currently - but reinstall should be quick ;) Andreas -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg, Germany GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From rfernandezlopez at suse.de Tue Nov 21 08:44:27 2017 From: rfernandezlopez at suse.de (Rafael =?utf-8?B?RmVybsOhbmRleiBMw7NwZXo=?=) Date: Tue, 21 Nov 2017 16:44:27 +0100 Subject: [caasp-beta] Antw: kubeconfig download error with DEX internal server error , Login error In-Reply-To: <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> References: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> <5A1409E40200001C0030593B@prv-mh.provo.novell.com> <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> Message-ID: <20171121154427.5c2ktbxlyhisbt3x@freedom> Hello Rushi, On Tue, Nov 21, 2017 at 03:29:36PM +0000, Ns, Rushi wrote: > Hi Martin > > Thank you. Yes I did , I used load balancer IP (lvsusekub8.pal.sap.corp) whichi s out of the cluster node address. The host I?ve specified is not velum ip, not master ip and not any of worker ip?s. We are studying the problem at the moment. From what I could see this is what's happening: - When the admin node starts it generates several certificates, based on the information of the machine in the moment of the boot (transient and static hostnames, machine attached ip addresses...). This hostnames get added to some initial certificates in their SAN extensions, as well as the attached IP addresses of the admin node. - When you enter the internal dashboard fqdn (first field on first page of the Velum setup), if you enter an external name not detected by this very first step (e.g. with cloud-init you chose as hostname `admin`, and in this field you write `admin.my.company`), the certificate used by LDAP won't contain `admin.my.company`, whereas Dex will try to connect to the LDAP instance in the admin node using `admin.my.company:389`. This effectively makes the TLS handshake to fail, and Dex is unable to authenticate the user against LDAP. More information will be added to the bug report, and we'll keep you updated. Thank you. -- Cheers, Rafa. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From vneuhauss at suse.com Wed Nov 22 05:34:02 2017 From: vneuhauss at suse.com (vinicius) Date: Wed, 22 Nov 2017 10:34:02 -0200 Subject: [caasp-beta] Helm transport is closing Message-ID: Hello All. I'm trying to do a deploy from a application using helm . I'm using helm client 2.6.1 same of tiller server. But when I try to do the deploy I have this output: E1122 10:28:50.743105?? 20036 portforward.go:178] lost connection to pod Error: transport is closing From nikhil at manchanda.me Wed Nov 22 14:52:07 2017 From: nikhil at manchanda.me (Nikhil Manchanda) Date: Wed, 22 Nov 2017 13:52:07 -0800 Subject: [caasp-beta] Helm transport is closing In-Reply-To: References: Message-ID: Hello vinicius: Can you go into more detail as to what you are trying to deploy? Sending us a copy of the tiller logs will also be helpful. That said, I have most often seen that error in the case where your chart is trying to deploy an ELB resource, which is timing out. For CaaSP, we do not support ELB, so if this is the case -- then the error is expected. You will need to update your chart (or your values.yaml) to pick a different way of accessing your service (something like NodePort would work, or ingress if you happen to have an ingress controller deployed.) Hope this helps, Cheers, Nikhil On Wed, Nov 22, 2017 at 4:34 AM, vinicius wrote: > Hello All. > > > I'm trying to do a deploy from a application using helm . I'm using helm > client 2.6.1 same of tiller server. But when I try to do the deploy I have > this output: > > E1122 10:28:50.743105 20036 portforward.go:178] lost connection to pod > Error: transport is closing > > > _______________________________________________ > caasp-beta mailing list > caasp-beta at lists.suse.com > http://lists.suse.com/mailman/listinfo/caasp-beta > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vmoutoussamy at suse.com Thu Nov 23 06:27:14 2017 From: vmoutoussamy at suse.com (Vincent Moutoussamy) Date: Thu, 23 Nov 2017 14:27:14 +0100 Subject: [caasp-beta] kubeconfig download error with DEX internal server error , Login error In-Reply-To: <22489E4B-B6E2-4FE5-A2FB-6D16C17E733B@sap.com> References: <090536B7-4B1A-4425-89C0-B24F589F7606@sap.com> <90BF4755-A3AB-4F6D-9463-5AFEC7F5DF65@suse.com> <9C1A5FD7-45DA-4C2F-9F89-B91ACEF9436D@sap.com> <22489E4B-B6E2-4FE5-A2FB-6D16C17E733B@sap.com> Message-ID: <8DB46DC4-B018-43B1-A4BF-2F90346DB906@suse.com> Hi Rushi, Yes, please do not contact people directly but prefer caasp-beta at lists.suse.com or beta-programs at lists.suse.com as escalation if you did not received the proper helps or if you want to have a private communication channel with SUSE. That being said, I?m working on your case and will contact you directly. Have a nice day, Regards, -- Vincent Moutoussamy SUSE Beta Program and SDK Project Manager -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: Message signed with OpenPGP URL: From rushi.ns at sap.com Thu Nov 23 20:20:08 2017 From: rushi.ns at sap.com (Ns, Rushi) Date: Fri, 24 Nov 2017 03:20:08 +0000 Subject: [caasp-beta] Antw: kubeconfig download error with DEX internal server error , Login error In-Reply-To: <5A1530AB0200001C00305E1E@prv-mh.provo.novell.com> References: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> <5A1409E40200001C0030593B@prv-mh.provo.novell.com> <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> <5A1530AB0200001C00305E1E@prv-mh.provo.novell.com> Message-ID: <335FE31F-9705-4E5D-A858-9791D4A33D63@sap.com> Hi Martin, You mean admin node (velum ) or the master host. Anyways, I checked both hosts and I see haproxy is running as a container in all 3 nodes (LVSUSEKUB4 is my master host IP: 10.48.164.143 and the rest of other 2 IP?s minion hosts IP address (10.48.164.142, 10.48.164.141) ? lvsusekub4:~ # kubectl get pods -o wide --all-namespaces |grep haproxy 2017-11-24 03:17:29.461307 I | proto: duplicate proto type registered: google.protobuf.Any 2017-11-24 03:17:29.461405 I | proto: duplicate proto type registered: google.protobuf.Duration 2017-11-24 03:17:29.461438 I | proto: duplicate proto type registered: google.protobuf.Timestamp kube-system haproxy-1200eade45cd435ea9b5cad239a496ad.infra.caasp.local 1/1 Running 0 9m 10.48.164.143 1200eade45cd435ea9b5cad239a496ad.infra.caasp.local kube-system haproxy-230ed2652a254f88aa30f1cf97c48b80.infra.caasp.local 1/1 Running 0 9m 10.48.164.142 230ed2652a254f88aa30f1cf97c48b80.infra.caasp.local kube-system haproxy-76b19a46b902429b9a98735a5f607436.infra.caasp.local 1/1 Running 0 9m 10.48.164.141 76b19a46b902429b9a98735a5f607436.infra.caasp.local and also the /etc/hosts entry has the hostname for 127.0.0.1 #-- start Salt-CaaSP managed hosts - DO NOT MODIFY -- ### service names ### 127.0.0.1 api api.infra.caasp.local lvsusekub4.pal.sap.corp Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Martin Weiss Date: Wednesday, November 22, 2017 at 12:16 AM To: "caasp-beta at lists.suse.com" , Rushi NS Subject: Re: Antw: [caasp-beta] kubeconfig download error with DEX internal server error , Login error Hi Rushi, I have learned that the entry in the hosts file is correct as this is pointing to haproxy that should also be running on the admin host. Could you verify if the admin host has haproxy running and verify if there are specific errors in the dashboard.log? Thanks Martin > Hi Martin Thank you. Yes I did , I used load balancer IP (lvsusekub8.pal.sap.corp) whichi s out of the cluster node address. The host I?ve specified is not velum ip, not master ip and not any of worker ip?s. Yes I do have the same entry as you said in my /etc/hosts file (FYI: lvsusekub2.pal.sap.corp is my API server FQDN) Here is my /etc/hosts file #-- start Salt-CaaSP managed hosts - DO NOT MODIFY -- ### service names ### 127.0.0.1 api api.infra.caasp.local lvsusekub2.pal.sap.corp Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Martin Weiss Date: Tuesday, November 21, 2017 at 3:18 AM To: "caasp-beta at lists.suse.com" , Rushi NS Subject: Antw: [caasp-beta] kubeconfig download error with DEX internal server error , Login error Hi Rushi, did you specify a specific external FQDN for the API? Could you check if you have a similar strange entry in the /etc/hosts file on the admin with 127.0.0.1 api ... ? --> this was blocking my velum to contact the API on a master and due to that I could not download the kube-config.. Martin Hello Team, I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: From Martin.Weiss at suse.com Fri Nov 24 01:37:52 2017 From: Martin.Weiss at suse.com (Martin Weiss) Date: Fri, 24 Nov 2017 01:37:52 -0700 Subject: [caasp-beta] Antw: kubeconfig download error with DEX internal server error , Login error In-Reply-To: <335FE31F-9705-4E5D-A858-9791D4A33D63@sap.com> References: <7AE2FFC1-1D5A-4814-BE1C-075D837A245D@sap.com> <5A1409E40200001C0030593B@prv-mh.provo.novell.com> <6E31F090-F2AA-47BF-A9BF-62FE474CC079@sap.com> <5A1530AB0200001C00305E1E@prv-mh.provo.novell.com> <335FE31F-9705-4E5D-A858-9791D4A33D63@sap.com> Message-ID: <5A17DA600200001C00306680@prv-mh.provo.novell.com> Hi Rushi, I mean on the velum node. So below it seems that this is k8s-master - what do you see with "docker ps" on the admin node (velum host)? Thanks Martin >Hi Martin, You mean admin node (velum ) or the master host. Anyways, I checked both hosts and I see haproxy is running as a container in all 3 nodes (LVSUSEKUB4 is my master host IP: 10.48.164.143 and the rest of other 2 IP?s minion hosts IP address (10.48.164.142, 10.48.164.141) ? lvsusekub4:~ # kubectl get pods -o wide --all-namespaces |grep haproxy 2017-11-24 03:17:29.461307 I | proto: duplicate proto type registered: google.protobuf.Any 2017-11-24 03:17:29.461405 I | proto: duplicate proto type registered: google.protobuf.Duration 2017-11-24 03:17:29.461438 I | proto: duplicate proto type registered: google.protobuf.Timestamp kube-system haproxy-1200eade45cd435ea9b5cad239a496ad.infra.caasp.local 1/1 Running 0 9m 10.48.164.143 1200eade45cd435ea9b5cad239a496ad.infra.caasp.local kube-system haproxy-230ed2652a254f88aa30f1cf97c48b80.infra.caasp.local 1/1 Running 0 9m 10.48.164.142 230ed2652a254f88aa30f1cf97c48b80.infra.caasp.local kube-system haproxy-76b19a46b902429b9a98735a5f607436.infra.caasp.local 1/1 Running 0 9m 10.48.164.141 76b19a46b902429b9a98735a5f607436.infra.caasp.local and also the /etc/hosts entry has the hostname for 127.0.0.1 #-- start Salt-CaaSP managed hosts - DO NOT MODIFY -- ### service names ### 127.0.0.1 api api.infra.caasp.local lvsusekub4.pal.sap.corp Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From: Martin Weiss Date: Wednesday, November 22, 2017 at 12:16 AM To: "caasp-beta at lists.suse.com" , Rushi NS Subject: Re: Antw: [caasp-beta] kubeconfig download error with DEX internal server error , Login error Hi Rushi, I have learned that the entry in the hosts file is correct as this is pointing to haproxy that should also be running on the admin host. Could you verify if the admin host has haproxy running and verify if there are specific errors in the dashboard.log? Thanks Martin > Hi Martin Thank you. Yes I did , I used load balancer IP (lvsusekub8.pal.sap.corp) whichi s out of the cluster node address. The host I?ve specified is not velum ip, not master ip and not any of worker ip?s. Yes I do have the same entry as you said in my /etc/hosts file (FYI: lvsusekub2.pal.sap.corp is my API server FQDN) Here is my /etc/hosts file #-- start Salt-CaaSP managed hosts - DO NOT MODIFY -- ### service names ### 127.0.0.1 api api.infra.caasp.local lvsusekub2.pal.sap.corp Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE From:Martin Weiss Date: Tuesday, November 21, 2017 at 3:18 AM To: "caasp-beta at lists.suse.com" , Rushi NS Subject: Antw: [caasp-beta] kubeconfig download error with DEX internal server error , Login error Hi Rushi, did you specify a specific external FQDN for the API? Could you check if you have a similar strange entry in the /etc/hosts file on the admin with 127.0.0.1 api ... ? --> this was blocking my velum to contact the API on a master and due to that I could not download the kube-config.. Martin Hello Team, I built the cluster with latest SUSE CAASP 2.0 and was getting errors with dex authentication when downloading kubeconfig file from velum webinterface. Did anyone experience this error. I did multiple setups (multi master and single master) but both clusters have the same error. My initial thought of this error with multi master setup (I have setup with multi master first ), however even with single master I got the same error, so not sure if this a bug but I can?t download kubeconfig file from velum. I got this error -------- ?internal server error , Login error? ------ My login to velum works fine with the same credentials, however for download kubeconfig file the authentication is failing . Let me know if anyone experience the same. Best Regards, Rushi. I MAY BE ONLY ONE PERSON, BUT I CAN BE ONE PERSON WHO MAKES A DIFFERENCE _______________________________________________ caasp-beta mailing list caasp-beta at lists.suse.com http://lists.suse.com/mailman/listinfo/caasp-beta -------------- next part -------------- An HTML attachment was scrubbed... URL: