dbTalk Databases Forums  

webmail user mailbox schema design?

comp.databases.theory comp.databases.theory


Discuss webmail user mailbox schema design? in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
meranayaghar@yahoo.com
 
Posts: n/a

Default webmail user mailbox schema design? - 05-01-2009 , 11:06 AM






Hi Folks,
Does anyone have any insight into what user mailbox stores look like
for yahoo, gmail, hotmail, etc? I am curious if they have, e.g. a
mailbox table for each user that contains all their email or whether
they have a single large table with email for all users or whether
they have a smaller subset of tables with the users distributed
between them? Each user can have a large number of messages, so a
single table would end up having billions of records (for one of these
webmail systems with millions of users) and likewise you wouldn't want
separate tables for each user because then you'd have millions of
tables. Any thoughts? Thanks.
Sean

Reply With Quote
  #2  
Old   
johnbhurley@sbcglobal.net
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 11:24 AM






On May 1, 12:06*pm, meranayag... (AT) yahoo (DOT) com wrote:
Quote:
Hi Folks,
*Does anyone have any insight into what user mailbox stores look like
for yahoo, gmail, hotmail, etc? I am curious if they have, e.g. a
mailbox table for each user that contains all their email or whether
they have a single large table with email for all users or whether
they have a smaller subset of tables with the users distributed
between them? Each user can have a large number of messages, so a
single table would end up having billions of records (for one of these
webmail systems with millions of users) and likewise you wouldn't want
separate tables for each user because then you'd have millions of
tables. Any thoughts? Thanks.
Sean
A separate mail box table for each user? Yikes ...

Why don't you start with a proposed ERD for this type of application?
What are the entities involved? What kind of relationships between
the entities?


Reply With Quote
  #3  
Old   
ddf
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 11:30 AM



On May 1, 11:06*am, meranayag... (AT) yahoo (DOT) com wrote:
Quote:
Hi Folks,
*Does anyone have any insight into what user mailbox stores look like
for yahoo, gmail, hotmail, etc? I am curious if they have, e.g. a
mailbox table for each user that contains all their email or whether
they have a single large table with email for all users or whether
they have a smaller subset of tables with the users distributed
between them? Each user can have a large number of messages, so a
single table would end up having billions of records (for one of these
webmail systems with millions of users) and likewise you wouldn't want
separate tables for each user because then you'd have millions of
tables. Any thoughts? Thanks.
Sean
Have you considered using a single, partitioned table for your
messages? It's likely you'll have an ACCOUNTS


Reply With Quote
  #4  
Old   
ddf
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 11:37 AM



On May 1, 11:06*am, meranayag... (AT) yahoo (DOT) com wrote:
Quote:
Hi Folks,
*Does anyone have any insight into what user mailbox stores look like
for yahoo, gmail, hotmail, etc? I am curious if they have, e.g. a
mailbox table for each user that contains all their email or whether
they have a single large table with email for all users or whether
they have a smaller subset of tables with the users distributed
between them? Each user can have a large number of messages, so a
single table would end up having billions of records (for one of these
webmail systems with millions of users) and likewise you wouldn't want
separate tables for each user because then you'd have millions of
tables. Any thoughts? Thanks.
Sean
Have you considered using a single, partitioned table for your
messages? It's likely you'll have an ACCOUNTS table to identify
unique email addresses, an ACCOUNT_DETAILS table to contain profile
information and a MESSAGES table linked to ACCOUNTS via the user_id.
Partitioning the MESSAGES table on user_id could provide speed and
manageability by segregating each users messages to a single
partition; partition pruning would eliminate visits to unrelated
partitions effectively reducing the data set to a fraction of the
total number of messages stored.

It's not a directive, just a thought possibly worth considering.


David Fitzjarrell


Reply With Quote
  #5  
Old   
Tegiri Nenashi
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 12:51 PM



On May 1, 9:37*am, ddf <orat... (AT) msn (DOT) com> wrote:
Quote:
Partitioning the MESSAGES table on user_id could provide speed and
manageability by segregating each users messages to a single
partition; partition pruning would eliminate visits to unrelated
partitions effectively reducing the data set to a fraction of the
total number of messages stored.
I struggle to understand a single advantage of partitioning. Well, you
listed one advantage -- easy deletion. However, since when deletion
became a determining factor in database design?

Let me list disadvantages.

- It is just yet another complication on top of already messy vendor
implementation. A partitioned table is fundamentally a union view;
some database vendors (oracle), however, have decided that it is
something completely different.

- I refuse to understand why a single solution - table - doesn't
scale, and where the demarcation line between "small" and "large"
lies.

- Just look onto the syntax

create table trans (
trans_id number,
trans_dt date,
product_code number,
store_id number,
trans_amount number(12,2)
)
partition by range (trans_dt)
(
partition y05q1 values less than (to_date('04/01/2005','mm/dd/
yyyy')) tablespace y05q1,
partition y05q2 values less than (to_date('07/01/2005','mm/dd/
yyyy')) tablespace y05q2,
partition y05q3 values less than (to_date('10/01/2005','mm/dd/
yyyy')) tablespace y05q3,
partition y05q4 values less than (to_date('01/01/2006','mm/dd/
yyyy')) tablespace y05q4,
partition pmax values less than (maxvalue) tablespace users
)

The idea that DBA have to revisit that table definition every quarter
is just beyond me. Apparently oracle DBA has too little to do than to
tell computer in details how to organize its storage in each and every
minute detail. I propose browser vendors roll out a special DBA
edition where a user has to explicitly allocate heap memory for each
web page.

And I'm not even going into kitchen sink of subpartitions...


Reply With Quote
  #6  
Old   
johnbhurley@sbcglobal.net
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 01:11 PM



On May 1, 1:51*pm, Tegiri Nenashi <TegiriNena... (AT) gmail (DOT) com> wrote:

snip

Quote:
Partitioning the MESSAGES table on user_id could provide speed and
manageability by segregating each users messages to a single
partition; partition pruning would eliminate visits to unrelated
partitions effectively reducing the data set to a fraction of the
total number of messages stored.

I struggle to understand a single advantage of partitioning. Well, you
listed one advantage -- easy deletion. However, since when deletion
became a determining factor in database design?
Well it was just a shot in the dark by ddf with a little bit of
thought but no clear specifications at all from the OP.

I don't really think that being able to delete email messages may be a
viable proposition for most mail providers beyond a very small mom and
pop type of thing that is somehow not getting regulated but could be
wrong here.

Quote:
Let me list disadvantages.

- It is just yet another complication on top of already messy vendor
implementation. A partitioned table is fundamentally a union view;
some database vendors (oracle), however, have decided that it is
something completely different.

- I refuse to understand why a single solution - table - doesn't
scale, and where the demarcation line between "small" and "large"
lies.
Well you would need a clear design and some specific testing
methodology.

Partitioning does seem to get heavily overused and in many cases
thrown in without justification.

Quote:
The idea that DBA have to revisit that table definition every quarter
is just beyond me. Apparently oracle DBA has too little to do than to
tell computer in details how to organize its storage in each and every
minute detail. I propose browser vendors roll out a special DBA
edition where a user has to explicitly allocate heap memory for each
web page.
There's a lot of improvements recently in how much time and effort is
spent managing this area.

Quote:
And I'm not even going into kitchen sink of subpartitions...
Seems like you are going a little off course here from the idea in the
beginning of this thread.



Reply With Quote
  #7  
Old   
Bob Badour
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 01:15 PM



Tegiri Nenashi wrote:

Quote:
On May 1, 9:37 am, ddf <orat... (AT) msn (DOT) com> wrote:

Partitioning the MESSAGES table on user_id could provide speed and
manageability by segregating each users messages to a single
partition; partition pruning would eliminate visits to unrelated
partitions effectively reducing the data set to a fraction of the
total number of messages stored.

I struggle to understand a single advantage of partitioning. Well, you
listed one advantage -- easy deletion. However, since when deletion
became a determining factor in database design?
What's even more striking and even harder to understand is someone
suggesting highly detailed, very explicit physical implementation
details on the basis of the sketchiest of problem statements.

I am sure you would agree: Anyone doing so has no idea what they are
talking about.


Reply With Quote
  #8  
Old   
Michael Austin
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 02:01 PM



Tegiri Nenashi wrote:
Quote:
On May 1, 9:37 am, ddf <orat... (AT) msn (DOT) com> wrote:
Partitioning the MESSAGES table on user_id could provide speed and
manageability by segregating each users messages to a single
partition; partition pruning would eliminate visits to unrelated
partitions effectively reducing the data set to a fraction of the
total number of messages stored.

I struggle to understand a single advantage of partitioning. Well, you
listed one advantage -- easy deletion. However, since when deletion
became a determining factor in database design?
You, my friend have obviously only ever worked on very small databases.
In a DW enviroment, you would be insane to not use partitions. You
really need to get out more often if you can't think of a single
advantage.

In your "shop" you may not ever need this feature, but when you do, you
will learn very quickly to appreciate it

Before being laid off from my last job, we were adding 200+ partitions a
day to a DW that was several hundredTB. When you add hundreds of
millions of rows/day, it is paramount to use partitions.

When you need to query with this much data, being able to restrict your
search to a single or even a couple of partitions - you WILL become a
believer in the use of partitions.

Quote:
Let me list disadvantages.

- It is just yet another complication on top of already messy vendor
implementation. A partitioned table is fundamentally a union view;
some database vendors (oracle), however, have decided that it is
something completely different.

- I refuse to understand why a single solution - table - doesn't
scale, and where the demarcation line between "small" and "large"
lies.

- Just look onto the syntax

create table trans (
trans_id number,
trans_dt date,
product_code number,
store_id number,
trans_amount number(12,2)
)
partition by range (trans_dt)
(
partition y05q1 values less than (to_date('04/01/2005','mm/dd/
yyyy')) tablespace y05q1,
partition y05q2 values less than (to_date('07/01/2005','mm/dd/
yyyy')) tablespace y05q2,
partition y05q3 values less than (to_date('10/01/2005','mm/dd/
yyyy')) tablespace y05q3,
partition y05q4 values less than (to_date('01/01/2006','mm/dd/
yyyy')) tablespace y05q4,
partition pmax values less than (maxvalue) tablespace users
)

The idea that DBA have to revisit that table definition every quarter
is just beyond me. Apparently oracle DBA has too little to do than to
tell computer in details how to organize its storage in each and every
minute detail. I propose browser vendors roll out a special DBA
edition where a user has to explicitly allocate heap memory for each
web page.

And I'm not even going into kitchen sink of subpartitions...

Reply With Quote
  #9  
Old   
ddf
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 02:49 PM



On May 1, 1:15*pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:
Quote:
Tegiri Nenashi wrote:
On May 1, 9:37 am, ddf <orat... (AT) msn (DOT) com> wrote:

Partitioning the MESSAGES table on user_id could provide speed and
manageability by segregating each users messages to a single
partition; partition pruning would eliminate visits to unrelated
partitions effectively reducing the data set to a fraction of the
total number of messages stored.

I struggle to understand a single advantage of partitioning. Well, you
listed one advantage -- easy deletion. However, since when deletion
became a determining factor in database design?

What's even more striking and even harder to understand is someone
suggesting highly detailed, very explicit physical implementation
details on the basis of the sketchiest of problem statements.

I am sure you would agree: Anyone doing so has no idea what they are
talking about.
I fail to see the 'highly detailed' nature of my response. Please
post the text from that offering where my thought became an explicit
method. Include the DDL I supplied as well.


David Fitzjarrell


Reply With Quote
  #10  
Old   
Tegiri Nenashi
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 03:55 PM



On May 1, 12:01*pm, Michael Austin
Quote:
You, my friend have obviously only ever worked on very small databases.
Define "small"...
"Terradata" -- a spectacular name 10 years ago sounds pretty ordinary
today.

Quote:
In a DW enviroment, you would be insane to not use partitions.
Ah, the one that stubbornly resist indexing. Why indexing by the
record time is such a bad idea? For example, why table Sales
partitioned by the TransactionTime column can't be index organized?

Quote:
When you need to query with this much data, being able to restrict your
search to a single or even a couple of partitions - you WILL become a
believer in the use of partitions.
Again, from query access path perspective partitioning looks like
rudimentary indexing. Because everything that partition pruning does
index range scan does better. Yet, implementation of this rather
unsophisticated idea of chopping one table into many is not stellar: I
have seen optimizer failing to do partition pruning about as often as
it failing to find an indexed access path.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.