Re: [suse-oracle] Linux-Clustering

From: Lars Marowsky-Bree (lmb@suse.de)
Date: Fri Sep 19 2003 - 23:43:39 CEST


Date: Fri, 19 Sep 2003 23:43:39 +0200
From: Lars Marowsky-Bree <lmb@suse.de>
Message-ID: <20030919214339.GA5127@marowsky-bree.de>
Subject: Re: [suse-oracle] Linux-Clustering

On 2003-09-19T16:35:28,
   Martin Konold <martin.konold@erfrakon.de> said:

> > > Michael: Does Oracle provide service for DRDB setups?
> > This is being worked on.
> So this alone is imho enough reason not to use DRDB in a mission critical
> Oracle environment.

As I said, this is being worked on. If you consider deploying DRBD right
now for Oracle, I suggest you contact your Oracle support
representative, and if he says no, please route it through us.

However, that's a completely separate line of reasoning; you were
claiming drbd corrupts data by design, and that was just plain wrong.
Can we agree on that?

> > Except that of course the shared media becomes the single point of
> > failure, unless you have a lot of money to throw at the problem.
> Not really if you use Fibre Channel attached RAID-1 storage with mirroring.

... And that means you need to get the same external SAN box twice.
Thats easily twice the storage cost compared to a DRBD solution, where
you can use internal disk subsystems.

And RAID1 mirroring on disk level will also block if one of the disk
subsystems fails to reply until it's final status has been determined,
just as drbd will until it has noticed the other node is disconnected or
crashed. There is little difference here.

> > FailSafe is not actively maintained for SLES8.
> Good to know from one of the developers/porters of failsafe.
>
> Maybe SuSE should change their website to reflect this fact.
>
> http://www.suse.de/de/company/customer_references/services.html#cluster

That's reference customers. Why should we take out a successful
reference? We are not advertising it anywhere as a new technology, but
of course we have successful customer deployments in the field.

> Well replication within a SCSI RAID-1 device is much less error prone
> compared to depending on tcp/ip networking. Networking is unreliable
> by definition.

Well, so is cabling. And cabling you'll need in any case. Both scenarios
will recover from brief network (whether it is FC-AL or GigE) hiccups
accordingly. Your point being?

(And I'd disagree that SCSI RAID1 (on independent busses with two
separate storage subsystems, to be comparable to what drbd gives you) is
inherently less error prone. At least if you do not use optical wires
and FC, which is quite expensive.)

> Writing to a RAID-1 device is typically the very same speed like
> writing to a single device.

Yes. So is drbd. It's "very nearly the same speed" as writing to a
single device, given a little higher latency.

Martin, what do you think EMC or Oracle use internally for replication?
I know it is not drbd, but it's the same principle. Come on. You _know_
that these techniques suffer from the same constraints. They are basic.
In fact, there are SAN / NAS systems build which _do_ use drbd
internally.

> Disk mirroring can easily be scaled up by additionally using striping
> (RAID-10) while DRDB does not scale.

Well, you can add more disks to your drbd device too, and upgrade the
network interconnect. GigE is very close to FC-AL bandwidth (by default,
both are 1 GB/s), so if you need more than that you need to bundle links
or add higher bandwidth links in both cases. I don't get your point.

And as I said, 1GB/s is still quite a bit of bandwidth, going close to a
theoretical 90MB/s throughput. How close you can get depends on your
write patterns, but the same is true for RAID1. DRBD _IS_ raid1, just
attaching the remote disk not directly but via a network interconnect.

And again, as I said, if you require the very highest performance and if
cost is no issue compared to that, no, you likely will not use drbd. No
argument.

> > I've seen write speeds of 70MB/s (which is all my disks and GigE
> > interconnect could take) with protocol C. Remember latency is typically
> > very good with local GigE interconnects.
> I very much doubt your numbers! I dont believe that you get 70MB/s with DRDB
> over GigE for typical Oracle write performance.

Well, I didn't say that. I got >70MB/s using bonnie and tiobench, which
at least suggests it is not going to be "horribly slow". Maybe it will
only deliver 35MB/s for a given usage pattern or even less, but whether
that is "horribly slow" depends on the setup.

At least I _have_ some measured numbers, while you just spread unfounded
FUD. Sorry, but I admit I don't quite see why you do that. I challenge
you to show me a small to medium sized Oracle workload where drbd really
incurs a "horrible" slowdown and back that up with numbers.

Then we can talk and discuss why drbd is indeed not the right choice for
that scenario. But just claiming it is "horrible slow on principle",
that's a total FUD statement.

Note you also did not comment at all on my explanation of why your
argument that drbd corrupts data is just wrong, too.

I have understood you do not like drbd at all, and will not deploy it
anywhere. I am also not claiming it's perfect. It has it's place, as do
other solutions. But I also don't see why you have to spread
unsubstantiated, wrong statements about it, which you cannot prove and
which do not hold.

> All ERP systems I am aware of mainly write _very_ small junks of data to the
> db. Therefor the performance gets latency bound. DRDB makes latency much
> worse than direct SCSI.

Numbers, please. How bad is the effect on real workloads?

I think we both need to go and do some research here ;-) I'll find a
database benchmark and run it on top of drbd and natively, and then
we'll know more. But please, unless you _have_ numbers (in which case I
encourage you to share them), you cannot go around claiming things like
"horribly slow"; at least the more simple benchmarks disagree with you.

> > possible while being in the same city and barring nuclear strikes ;),
> > realtime and transactionally consistent replica of the database
> > available for failover?
> You can optain the same feature with FC (though more expensive ;-))

Exactly. The cost/benefit ratio is the point here.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering		ever tried. ever failed. no matter.
SuSE Labs				try again. fail again. fail better.
Research & Development, SUSE LINUX AG		-- Samuel Beckett
---------------------------------------------------------------------
To unsubscribe, e-mail: suse-oracle-unsubscribe@suse.com
For additional commands, e-mail: suse-oracle-help@suse.com
Please see http://www.suse.com/oracle/ before posting


This archive was generated by hypermail 2.1.7 : Fri Sep 19 2003 - 23:44:14 CEST