dbTalk Databases Forums  

Multiple tables, or a single table with a lot of rows?

comp.databases.mysql comp.databases.mysql


Discuss Multiple tables, or a single table with a lot of rows? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
jwcarlton
 
Posts: n/a

Default Multiple tables, or a single table with a lot of rows? - 01-03-2011 , 06:17 PM






I currently have 6 message boards) that are identical in structure,
but have different data. Each board has 2 tables; one for "subjects",
and one for "posts".

Each of the "posts" tables currently have approximately 1 million
rows, and during peak hours, it's not uncommon to have 500+ queries at
once.

I'm about to launch 100 new boards. Within the next year, I expect
each of them to have at least a few hundred thousand rows, which
brings my total "posts" data to roughly 18 million rows; within the
next 5 years, I would expect it to be > 1 billion rows.

Knowing this, would you guys recommend that I create these new boards
with 100 tables with identical structure (that would be, 100
"subjects" and 100 "posts"), or is there an advantage (in speed and
efficiency) to have them all in a single table, with a new column that
defines which board that row belongs to?

I had originally planned to have them all separate so that, in the
future, if I need to move some to a different server then it would be
simple and easy. But in retrospect, it's possible that running this
database more efficiently would eliminate (or, at least minimize) that
need to move.

TIA,

Jason

Reply With Quote
  #2  
Old   
onedbguru
 
Posts: n/a

Default Re: Multiple tables, or a single table with a lot of rows? - 01-03-2011 , 07:58 PM






On Jan 3, 7:17*pm, jwcarlton <jwcarl... (AT) gmail (DOT) com> wrote:
Quote:
I currently have 6 message boards) that are identical in structure,
but have different data. Each board has 2 tables; one for "subjects",
and one for "posts".

Each of the "posts" tables currently have approximately 1 million
rows, and during peak hours, it's not uncommon to have 500+ queries at
once.

I'm about to launch 100 new boards. Within the next year, I expect
each of them to have at least a few hundred thousand rows, which
brings my total "posts" data to roughly 18 million rows; within the
next 5 years, I would expect it to be > 1 billion rows.

Knowing this, would you guys recommend that I create these new boards
with 100 tables with identical structure (that would be, 100
"subjects" and 100 "posts"), or is there an advantage (in speed and
efficiency) to have them all in a single table, with a new column that
defines which board that row belongs to?

I had originally planned to have them all separate so that, in the
future, if I need to move some to a different server then it would be
simple and easy. But in retrospect, it's possible that running this
database more efficiently would eliminate (or, at least minimize) that
need to move.

TIA,

Jason

As as DBA you need to consider more than just the logical aspect of
your database. Can the overall SYSTEM handle the workload? Again,
that depends on lots of different factors mentioned below (and more).

At the rate you quoted (500queries "at once") for one bbs, you may
very well run into I/O issues long before you reach the target number
of bbs's depending upon your hardware. Is it a PC/Desktop class or a
Server class system... You may run out of horsepower before you get
to 100/100 of anything. This, of course, is dependent upon what
**you** consider "at once" 500 queries/sec /min? /hr? Do you have
any way to test 50000 queries "at once"? Once you have this in place,
how big are the rows (average)? Will it take a few GB or a few
hundred GB or > TB to store the data you have outlined.

If you were to go with the one table with many boards, look at
partitioning that table by that new column and storing each partition
(data AND index) on separate disks. This will improve the
parallelism. Even with RAID5 or RAID0+1, you could very well run into
an I/O channel bandwidth issue, again, depending on the "system". If
possible use a SAN with fiber-channel to maximize your performance.

Look at http://dev.mysql.com/tech-resources/...artitions.html

example:

create table...
PARTITION BY RANGE(id)
( PARTITION p0 VALUES LESS THAN(250000) (
INDEX DIRECTORY = '/u01/orders/i_p0'
DATA DIRECTORY = '/u02/orders/d_p0'
),
PARTITION p1 VALUES LESS THAN(500000) (
INDEX DIRECTORY = '/u03/orders/i_p1'
DATA DIRECTORY = '/u04/orders/d_p1',
),
PARTITION p2 VALUES LESS THAN(MAXVALUE) (
INDEX DIRECTORY = '/u05/orders/i_p2'
DATA DIRECTORY = '/u06/orders/d_p2'
)
etc....
);

Remember, your spindles have only one set of heads and they can only
be in one place at a time so "partitioning" the disk device(s) is
absolutely worthless. Next, how are you going to back this thing up?
You need to begin to think about recoverability of this data.

You have only just begun...

Reply With Quote
  #3  
Old   
jwcarlton
 
Posts: n/a

Default Re: Multiple tables, or a single table with a lot of rows? - 01-03-2011 , 08:38 PM



On Jan 3, 8:58*pm, onedbguru <onedbg... (AT) yahoo (DOT) com> wrote:
Quote:
On Jan 3, 7:17*pm, jwcarlton <jwcarl... (AT) gmail (DOT) com> wrote:



I currently have 6 message boards) that are identical in structure,
but have different data. Each board has 2 tables; one for "subjects",
and one for "posts".

Each of the "posts" tables currently have approximately 1 million
rows, and during peak hours, it's not uncommon to have 500+ queries at
once.

I'm about to launch 100 new boards. Within the next year, I expect
each of them to have at least a few hundred thousand rows, which
brings my total "posts" data to roughly 18 million rows; within the
next 5 years, I would expect it to be > 1 billion rows.

Knowing this, would you guys recommend that I create these new boards
with 100 tables with identical structure (that would be, 100
"subjects" and 100 "posts"), or is there an advantage (in speed and
efficiency) to have them all in a single table, with a new column that
defines which board that row belongs to?

I had originally planned to have them all separate so that, in the
future, if I need to move some to a different server then it would be
simple and easy. But in retrospect, it's possible that running this
database more efficiently would eliminate (or, at least minimize) that
need to move.

TIA,

Jason

As as DBA you need to consider more than just the logical aspect of
your database. *Can the overall SYSTEM handle the workload? Again,
that depends on lots of different factors mentioned below (and more).

At the rate you quoted (500queries "at once") for one bbs, you may
very well run into I/O issues long before you reach the target number
of bbs's depending upon your hardware. *Is it a PC/Desktop class or a
Server class system... *You may run out of horsepower before you get
to 100/100 of anything. This, of course, is dependent upon what
**you** consider "at once" *500 queries/sec /min? /hr? *Do you have
any way to test 50000 queries "at once"? *Once you have this in place,
how big are the rows (average)? *Will it take a few GB or a few
hundred GB or > *TB to store the data you have outlined.

If you were to go with the one table with many boards, look at
partitioning that table by that new column and storing each partition
(data AND index) on separate disks. *This will improve the
parallelism. *Even with RAID5 or RAID0+1, you could very well run into
an I/O channel bandwidth issue, again, depending on the "system". If
possible use a SAN with fiber-channel to maximize your performance.

Look athttp://dev.mysql.com/tech-resources/articles/mysql_5.1_partitions.html

example:

create table...
* PARTITION BY RANGE(id)
* * *( PARTITION p0 VALUES LESS THAN(250000) (
* * * * * INDEX DIRECTORY = '/u01/orders/i_p0'
* * * * * DATA DIRECTORY *= '/u02/orders/d_p0'
* * *),
* * * PARTITION p1 VALUES LESS THAN(500000) (
* * * * * INDEX DIRECTORY = '/u03/orders/i_p1'
* * * * * DATA DIRECTORY *= '/u04/orders/d_p1',
* * * ),
* * * PARTITION p2 VALUES LESS THAN(MAXVALUE) (
* * * * * INDEX DIRECTORY = '/u05/orders/i_p2'
* * * * * DATA DIRECTORY *= '/u06/orders/d_p2'
* * * )
etc....
);

Remember, your spindles have only one set of heads and they can only
be in one place at a time so "partitioning" the disk device(s) is
absolutely worthless. *Next, how are you going to back this thing up?
You need to begin to think about recoverability of this data.

You have only just begun...
I've run in to hardware issues in the past, yes, and have had to
upgrade 3 times. The current server is a Xeon Quad 3.2GHz with 4GB of
RAM, and a second hard drive used for backups (every 24 hours). Each
HDD is 500GB. I upgraded the RAM from 2G about 3 months ago, and I'm
already using every bit of it.

The current database is only about 1GB in size. So while 1 TB isn't
out of the question, I think it's going to be several hardware
upgrades in the future :-)

FWIW, the stats I'm collecting comes from Munin. So, when I say "at
once", I guess I should have stated "queries / second".

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.