[sles-beta] GEO Clusters, where to get basic information

Richard Brown rbrown at suse.de
Sun May 25 02:01:05 MDT 2014


On 2014-05-25 09:15, Darren Thompson wrote:
> Team
> 
> I have been using SLES HA for some time but have no exposure to SLES
> GEO Clusters.
> 
> Where is a good source of "basic/orientation' information for someone
> familiar with HA but no exposure to GEO?

The manual is pretty good :) 
https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha/book_sleha.html#cha.ha.geo

> What is it?

GEO is an Extension to the SLE HA Extension, which enables you to build 
'A Cluster of Clusters' spread across large Geographic distances 
(Separate Cities/States/Countries/Continents) with services being 
started/stopped on the separate cluster based on the network 
availability (or lack thereof) of the other locations.

> What are it's use cases and where should it NOT bet used?

Its typical use case is 'Disaster Recovery' or 'Business Continuity'. A 
cluster at Site A also has its HA resources and data replicated/mirrored 
at Site B which is many miles away (eg. over 30km) with high latency 
connection (eg over 15ms, too high for 'traditional' HA Local or Metro 
Area clusters). In the event of Site A 'failing', all of the resources 
that usually 'live' at Site A can be started at Site B, providing your 
business with a way to continue operating despite the loss of Site A.

It probably shouldn't be used when you're clusters are on the same local 
subnet and/or when latency is sufficiently low to make Metro clustering 
viable - why complicate matters if a simple option is available?

> What is the basic principle on which it operates?

GEO introduces the concept of 'Tickets' - Tickets are tokens which can 
be granted/revoked from a Cluster, and used as a dependency for the 
operation of Cluster Resources. eg. a ticket called TicketA might be 
required by all the Resources that are typically at Site A. When the 
Ticket is revoked from Site A and granted to the Cluster at Site B, Site 
B is then the only cluster able to operate the services that rely on 
TicketA's presence. The distribution of Tickets is handled by a piece of 
software called the 'booth'

Booth is an agent which runs on each cluster, and is configured to be 
aware of the other sites involved in your Geo cluster. Using UDP, it's 
designed to work in high latency environments. Booth usually has long 
expiration times on Tickets, given significant protection against brief 
connectivity issues. This comes at the potential expense of taking 
longer before automatically failing over, but this isn't the sort of 
thing you'd want the software to make a mistake about - more rapid 
allocation of tickets can of course be done by sysadmins.

Arbitrators are special machines running Booth but not running HA 
Clusters. This allows you to avoid split brain scenarios by having a 3rd 
independent server arbitrate about the status of the other Sites. 
Arbitrators are not necessary if you have an odd number of Clusters in 
your Geo Cluster.
Taking my simple 2 site example from earlier, if Sites A and B can't 
communicate with each other, there is no way Booth on either cluster can 
automatically figure out which site is operational. An arbitrator 
running elsewhere however should be able to provide enough information 
to determine whether Site A or B is 'down' or that both sites are 
actually 'up' and it's just a failure of the network link between Site A 
and B.

> Are there configuration example/templates?

As the details depend totally on the structure of your network and the 
local resources you want to enable for Geo clustering, templates are a 
tricky thing to provide.
The manual has some very simple examples which should be enough to get 
you started.

Best Regards,

- Richard

-- 
-------------------------------------------------------------------
   Richard Brown, QA Engineer
   Phone +4991174053-361,  Fax +4991174053-483
   SUSE LINUX Products GmbH,  Maxfeldstr. 5,  D-90409 Nuernberg
   Geschaeftsfuehrer: Jeff Hawn, Jennifer Guild, Felix Imendoerffer,
   HRB 16746 (AG Nuernberg)
-------------------------------------------------------------------


More information about the sles-beta mailing list