dbTalk Databases Forums  

MySQL Cluster: replication within a cluster

comp.databases.mysql comp.databases.mysql


Discuss MySQL Cluster: replication within a cluster in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
vadim
 
Posts: n/a

Default MySQL Cluster: replication within a cluster - 11-07-2010 , 10:12 AM






Hi guys!

I'm rather new to MySQL Cluster, so let me ask a qustion that might
seem a noobie one
Say, I have a cluster consisting of 4 data nodes, with 2 replicas --
so there are 2 cluster groups, like here:
http://dev.mysql.com/doc/refman/5.1/...es-groups.html
.. So when I'm making an INSERT query via MySQL node, how does that
data spreads through all partitions? Does it happen simultaneously?

Reply With Quote
  #2  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: MySQL Cluster: replication within a cluster - 11-08-2010 , 01:43 AM






vadim <samokhinvadim (AT) gmail (DOT) com> wrote:
Quote:
I'm rather new to MySQL Cluster, so let me ask a qustion that might
seem a noobie one
Say, I have a cluster consisting of 4 data nodes, with 2 replicas --
so there are 2 cluster groups, like here:
It's called "node groups"

Quote:
So when I'm making an INSERT query via MySQL node, how does that
data spreads through all partitions?
If you insert a single row, then this row goes to exactly one partition
of the cluster table. Which one, depends on the partitioning key and
the value(s) that the new row has in the key column(s).

Now since you have replicas=2, there are 2 nodes in the cluster that
have this partition. Or - using MySQL cluster terminology - all nodes
in a certain node group have it. Those nodes receive and store the
new row.

Quote:
Does it happen simultaneously?
From an outer view, yes. Internally this happens: the INSERT is part
of a transaction. For this transaction one data node was chosen (or
is chosen now) to act as transaction coordinator (TC). The SQL node
sends the new row to the TC (using the cluster internal signaling
mechanism). The TC determines which data nodes hold the respective
partition and sends the row to them.

The SQL node only talks to the TC. As soon as the TC acknowledges
that it has the new row, everything is fine for the SQL node.

COMMITting is a different story. This requires a full roundtrip
for all participating data nodes (actually two: one for phase 1
and another for phase 2).

Look out for the "MySQL CLuster: The Complete Turorial" and/or
"Design and Internals of MySQL Cluster" papers from Stewart Smith.


XL

Reply With Quote
  #3  
Old   
vadim
 
Posts: n/a

Default Re: MySQL Cluster: replication within a cluster - 11-08-2010 , 02:49 PM



On Nov 8, 10:43*am, Axel Schwenke <axel.schwe... (AT) gmx (DOT) de> wrote:
Quote:
vadim <samokhinva... (AT) gmail (DOT) com> wrote:

I'm rather new to MySQL Cluster, so let me ask a qustion that might
seem a noobie one
Say, I have a cluster consisting of 4 data nodes, with 2 replicas --
so there are 2 cluster groups, like here:

It's called "node groups"

So when I'm making an INSERT query via MySQL node, how does that
data spreads through all partitions?

If you insert a single row, then this row goes to exactly one partition
of the cluster table. Which one, depends on the partitioning key and
the value(s) that the new row has in the key column(s).

Now since you have replicas=2, there are 2 nodes in the cluster that
have this partition. Or - using MySQL cluster terminology - all nodes
in a certain node group have it. Those nodes receive and store the
new row.

Does it happen simultaneously?

From an outer view, yes. Internally this happens: the INSERT is part
of a transaction. For this transaction one data node was chosen (or
is chosen now) to act as transaction coordinator (TC). The SQL node
sends the new row to the TC (using the cluster internal signaling
mechanism). The TC determines which data nodes hold the respective
partition and sends the row to them.

The SQL node only talks to the TC. As soon as the TC acknowledges
that it has the new row, everything is fine for the SQL node.

COMMITting is a different story. This requires a full roundtrip
for all participating data nodes (actually two: one for phase 1
and another for phase 2).

Look out for the "MySQL CLuster: The Complete Turorial" and/or
"Design and Internals of MySQL Cluster" papers from Stewart Smith.

XL
Oops, seems I replied to you by email twice, sorry, I didn't mean
to Thought something wrong with Google Groups because I could't see
my post in this thread. Think that if you reply, it will be useful if
it will be public So, once again:
Have you used MySQL Cluster in production environment and what is your
opinion about it comparing to common MySQL? What problems have you run
into and how have you got along?

Thanks in advance.

Reply With Quote
  #4  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: MySQL Cluster: replication within a cluster - 11-09-2010 , 03:46 AM



Hi Vadim,

vadim <samokhinvadim (AT) gmail (DOT) com> wrote:

Quote:
Oops, seems I replied to you by email twice, sorry, I didn't mean
to
Oops, looks like my SPAM filter ate it. I hadn't answered anyway.

Quote:
Have you used MySQL Cluster in production environment and what is your
I am working with MySQL Cluster on a professional basis, yes.

Quote:
opinion about it comparing to common MySQL?
It's a completely different beast. If you try using NDB as a drop-in
replacement for i.e. InnoDB, then you will be disappointed. Maybe it
was a bad idea to call NDB "MySQL Cluster" because anybody imagines
a traditional cluster of an oldfashioned SQL engine with some shared
storage. NDB is different!

NDB was designed as shared-nothing, in-memory, realtime database
engine. It distributes on multiple nodes not only for HA, but also
for performance (scale out). It is tremendously fast for simple
operations like index lookups. It's pretty bad for complex queries,
like JOINs or full table aggregates.

If you intend to use NDB through the SQL layer, then you can not
easily leverage full performance. For a typical 4-node NDB cluster
you will need at least 4 SQL nodes and distribute your load. Using
NDBAPI in your application (for direct access to NDB) will yield a
3x up to 10x increase in performance. Using the asynchronous variant
of NDBAPI will give another factor of ~2.

Quote:
What problems have you run into and how have you got along?
NDB is a diva. You must know what kind of workload you are going to
apply and configure NDB accordingly. If you misconfigure it or if
you overload it, it may simply crash (although mostly with a message
along "running out of xxx buffer, increase xxx-buffer-size")

But if you configure NDB accordingly and use it for what it was
designed for, then you can get really impressive performance:

http://johanandersson.blogspot.com/2...-on-pk_11.html


XL

Reply With Quote
  #5  
Old   
vadim
 
Posts: n/a

Default Re: MySQL Cluster: replication within a cluster - 11-09-2010 , 03:35 PM



On Nov 9, 12:46 pm, Axel Schwenke <axel.schwe... (AT) gmx (DOT) de> wrote:
Quote:
Hi Vadim,

vadim <samokhinva... (AT) gmail (DOT) com> wrote:
Oops, seems I replied to you by email twice, sorry, I didn't mean
to

Oops, looks like my SPAM filter ate it. I hadn't answered anyway.

Have you used MySQL Cluster in production environment and what is your

I am working with MySQL Cluster on a professional basis, yes.

opinion about it comparing to common MySQL?

It's a completely different beast. If you try using NDB as a drop-in
replacement for i.e. InnoDB, then you will be disappointed. Maybe it
was a bad idea to call NDB "MySQL Cluster" because anybody imagines
a traditional cluster of an oldfashioned SQL engine with some shared
storage. NDB is different!

NDB was designed as shared-nothing, in-memory, realtime database
engine. It distributes on multiple nodes not only for HA, but also
for performance (scale out). It is tremendously fast for simple
operations like index lookups. It's pretty bad for complex queries,
like JOINs or full table aggregates.

If you intend to use NDB through the SQL layer, then you can not
easily leverage full performance. For a typical 4-node NDB cluster
you will need at least 4 SQL nodes and distribute your load. Using
NDBAPI in your application (for direct access to NDB) will yield a
3x up to 10x increase in performance. Using the asynchronous variant
of NDBAPI will give another factor of ~2.

What problems have you run into and how have you got along?

NDB is a diva. You must know what kind of workload you are going to
apply and configure NDB accordingly. If you misconfigure it or if
you overload it, it may simply crash (although mostly with a message
along "running out of xxx buffer, increase xxx-buffer-size")

But if you configure NDB accordingly and use it for what it was
designed for, then you can get really impressive performance:

http://johanandersson.blogspot.com/2...r-performance-...

XL
Hi Axel!

So I should use it with simple queries, ok. But it's hard to imagine a
web application without joins -- otherwise there will be a lot of
simple queries. For example I need to output a list of posts with the
authors' names. So my first query will be the list of posts, and the
others -- one query per post for author retrieve. Is that a MySQL
Cluster way?
And what about cache, by the way? If MySQL Cluster is in-memory db
engine, should I use Memcache or anything else?

Reply With Quote
  #6  
Old   
vadim
 
Posts: n/a

Default Re: MySQL Cluster: replication within a cluster - 11-09-2010 , 03:35 PM



On Nov 9, 12:46 pm, Axel Schwenke <axel.schwe... (AT) gmx (DOT) de> wrote:
Quote:
Hi Vadim,

vadim <samokhinva... (AT) gmail (DOT) com> wrote:
Oops, seems I replied to you by email twice, sorry, I didn't mean
to

Oops, looks like my SPAM filter ate it. I hadn't answered anyway.

Have you used MySQL Cluster in production environment and what is your

I am working with MySQL Cluster on a professional basis, yes.

opinion about it comparing to common MySQL?

It's a completely different beast. If you try using NDB as a drop-in
replacement for i.e. InnoDB, then you will be disappointed. Maybe it
was a bad idea to call NDB "MySQL Cluster" because anybody imagines
a traditional cluster of an oldfashioned SQL engine with some shared
storage. NDB is different!

NDB was designed as shared-nothing, in-memory, realtime database
engine. It distributes on multiple nodes not only for HA, but also
for performance (scale out). It is tremendously fast for simple
operations like index lookups. It's pretty bad for complex queries,
like JOINs or full table aggregates.

If you intend to use NDB through the SQL layer, then you can not
easily leverage full performance. For a typical 4-node NDB cluster
you will need at least 4 SQL nodes and distribute your load. Using
NDBAPI in your application (for direct access to NDB) will yield a
3x up to 10x increase in performance. Using the asynchronous variant
of NDBAPI will give another factor of ~2.

What problems have you run into and how have you got along?

NDB is a diva. You must know what kind of workload you are going to
apply and configure NDB accordingly. If you misconfigure it or if
you overload it, it may simply crash (although mostly with a message
along "running out of xxx buffer, increase xxx-buffer-size")

But if you configure NDB accordingly and use it for what it was
designed for, then you can get really impressive performance:

http://johanandersson.blogspot.com/2...r-performance-...

XL
Hi Axel!

So I should use it with simple queries, ok. But it's hard to imagine a
web application without joins -- otherwise there will be a lot of
simple queries. For example I need to output a list of posts with the
authors' names. So my first query will be the list of posts, and the
others -- one query per post for author retrieve. Is that a MySQL
Cluster way?
And what about cache, by the way? If MySQL Cluster is in-memory db
engine, should I use Memcache or anything else?

Reply With Quote
  #7  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: MySQL Cluster: replication within a cluster - 11-10-2010 , 02:00 AM



vadim <samokhinvadim (AT) gmail (DOT) com> wrote:
Quote:
On Nov 9, 12:46 pm, Axel Schwenke <axel.schwe... (AT) gmx (DOT) de> wrote:

NDB was designed as shared-nothing, in-memory, realtime database
engine. It distributes on multiple nodes not only for HA, but also
for performance (scale out). It is tremendously fast for simple
operations like index lookups. It's pretty bad for complex queries,
like JOINs or full table aggregates.
BTW, this is a property that NDB shares with other distributed
databases. Complex queries require a lot of network roundtrips
in such architectures and that's the reason why performance
degrades that much (compared to simple queries).

Quote:
... if you configure NDB accordingly and use it for what it was
designed for, then you can get really impressive performance:

So I should use it with simple queries, ok. But it's hard to imagine a
web application without joins
One would hardly chose NDB as generic engine for the database
behind a web application (except edge cases, of course). But for
subsystems, like session storage, NDB makes a lot of sense.

Quote:
otherwise there will be a lot of
simple queries. For example I need to output a list of posts with the
authors' names.
Another "web forum" software? Don't we have too many of those
already? And what's wrong with usenet anyway?

Quote:
So my first query will be the list of posts, and the
others -- one query per post for author retrieve. Is that a MySQL
Cluster way?
MySQL isn't very good for hierarchical data (i.e. it does not
have CONNECT BY). Consequently querying hierarchical data will
require a lot of joins. Thus: "no"

Quote:
And what about cache, by the way? If MySQL Cluster is in-memory db
engine, should I use Memcache or anything else?
NDB is very comparable to memcache. Except that it scales in a
different way (if you have N memcached instances, each holds
potentially the same data, in NDB data is partitioned) and that
NDB adds real transactions and 100% consistency.

But if you have a small active data set, consistency does not
matter and you want to scale to hundreds of nodes, then memcached
is very likely the better solution. It all depends on your needs.


XL

Reply With Quote
  #8  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: MySQL Cluster: replication within a cluster - 11-10-2010 , 10:55 AM



On Wed, 10 Nov 2010 09:00:07 +0100, Axel Schwenke wrote:
Quote:
vadim <samokhinvadim (AT) gmail (DOT) com> wrote:
otherwise there will be a lot of
simple queries. For example I need to output a list of posts with the
authors' names.

Another "web forum" software? Don't we have too many of those
already? And what's wrong with usenet anyway?
I'm hoping this is a thought experiment that includes a familiar set of
requirements.

--
The only thing I'd use on guinea-fowl is a shredder. Same with
peacocks. The sound of peacocks being shredded can't possibly be any
worse than the sound of peacocks not being shredded.
-- Tanuki

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.