dbTalk Databases Forums  

webmail user mailbox schema design?

comp.databases.theory comp.databases.theory


Discuss webmail user mailbox schema design? in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Bob Badour
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 04:12 PM






ddf wrote:

Quote:
On May 1, 1:15 pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:

Tegiri Nenashi wrote:

On May 1, 9:37 am, ddf <orat... (AT) msn (DOT) com> wrote:

Partitioning the MESSAGES table on user_id could provide speed and
manageability by segregating each users messages to a single
partition; partition pruning would eliminate visits to unrelated
partitions effectively reducing the data set to a fraction of the
total number of messages stored.

I struggle to understand a single advantage of partitioning. Well, you
listed one advantage -- easy deletion. However, since when deletion
became a determining factor in database design?

What's even more striking and even harder to understand is someone
suggesting highly detailed, very explicit physical implementation
details on the basis of the sketchiest of problem statements.

I am sure you would agree: Anyone doing so has no idea what they are
talking about.

I fail to see the 'highly detailed' nature of my response. Please
post the text from that offering where my thought became an explicit
method. Include the DDL I supplied as well.

David Fitzjarrell
One does not need to provide DDL to make highly detailed and explicit
design suggestions like partitions.


Reply With Quote
  #12  
Old   
Michael Austin
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-01-2009 , 10:31 PM






Tegiri Nenashi wrote:
Quote:
On May 1, 12:01 pm, Michael Austin
You, my friend have obviously only ever worked on very small databases.

Define "small"...
"Terradata" -- a spectacular name 10 years ago sounds pretty ordinary
today.

Terradata is a HARDWARE partitioning of data - at least the last time I
looked... My Oracle DB bigger than a Terradata in the same company -
well, due "economic conditions and poor WS performance" I am no longer
there.. oh well..

Exadata (New Oracle beast) is also a hardware partitioning+db
partitioning and maybe some indexing...

As for the definition of "small"
Medium is < 20TB
Small is < 1TB.
Tiny is < 500G.
IttyBitty is < 50G.

Then there is large and HUGE.



Quote:
In a DW enviroment, you would be insane to not use partitions.

Ah, the one that stubbornly resist indexing. Why indexing by the
record time is such a bad idea? For example, why table Sales
partitioned by the TransactionTime column can't be index organized?
Who's stubborn? Not I. Not only do I use partitioning on things of this
size I also index.

Quote:
When you need to query with this much data, being able to restrict your
search to a single or even a couple of partitions - you WILL become a
believer in the use of partitions.

Again, from query access path perspective partitioning looks like
rudimentary indexing. Because everything that partition pruning does
One could argue that, but they would be wrong.

Quote:
index range scan does better. Yet, implementation of this rather
unsophisticated idea of chopping one table into many is not stellar: I
have seen optimizer failing to do partition pruning about as often as
it failing to find an indexed access path.

While in some cases this is true, you have to learn to finesse the
optimizer. Some of the hints work some of the time, but not always.
Query "building" is not just throwing a bunch of tables together in
JOINS just because that is what is needed, in order to finesse the
optimizer successfully, one must give it something to work with. And to
do that, you have to figure out what it is looking for.

I have written about this before in this forum, but bears repeating.
Have you ever had a query that no matter what index you had or what hint
you had for a given column in the WHERE clause, the CBO just would not
pick that index - (you need to remember that "hints" are just that...
and the CBO can and in some cases DOES ignore them).

While at DEC and working with Rdb Regional Field Support, we had one
such case. We initially "fixed" the problem by duplicating that
statement... Again, this may not work in all cases, but I have solved
similar problems in Oracle version 8/9 and 10 using this method...

Example (terse pseudo-code version):

select a,b,c from a,b,c
where a.d = 'something'
and a.b = 'somethingelse'
and b.a=a.a
and a.a=a.a

Index is on a.d but it just will not use it...

Example (terse "improved" pseudo-code version):

select a,b,c from a,b,c
where
a.d = 'something'
and a.d = 'something'
and a.b = 'somethingelse'
and b.a=a.a
and a.a=a.a

Then the CBO gives a higher ranking to a.d and says "Oh - you really
wanted to use that index". The last time I used this, we had a query
that went from ~20-30 minutes to < 3.

BTW, the world lost a real genius at the passing of Gennady Anteshenkov
- the author of the CBO - although there have been lots of improvements
over the years, without his insight, it would not be where it is today.


Reply With Quote
  #13  
Old   
joel garry
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-04-2009 , 04:25 PM



On May 1, 9:06*am, meranayag... (AT) yahoo (DOT) com wrote:
Quote:
Hi Folks,
*Does anyone have any insight into what user mailbox stores look like
for yahoo, gmail, hotmail, etc? I am curious if they have, e.g. a
mailbox table for each user that contains all their email or whether
they have a single large table with email for all users or whether
they have a smaller subset of tables with the users distributed
between them? Each user can have a large number of messages, so a
single table would end up having billions of records (for one of these
webmail systems with millions of users) and likewise you wouldn't want
separate tables for each user because then you'd have millions of
tables. Any thoughts? Thanks.
Sean
I have no idea, but check out http://www.johnvey.com/features/gmailapi/

I interpret this as they simply have either a unix-style mbox with a
searchable index in php or something, or a simple modified open-source
db with the same sort of proprietary interface and indexing. Looking
at the wikipedia entry for gmail, and following it to the guy that
wrote it, found this gem in his blog, which he still writes in so
maybe you can simply ask him:

"I wrote the first version of Gmail in one day. It was not very
impressive. All I did was stuff my own email into the Google Groups
(Usenet) indexing engine. I sent it out to a few people for feedback,
and they said that it was somewhat useful, but it would be better if
it searched over their email instead of mine. That was version two.
After I released that people started wanting the ability to respond to
email as well. That was version three. That process went on for a
couple of years inside of Google before we released to the world."

So it sounds like the answer is, it just uses the regular google
searchable storage with an API on top of that.

Yahoo mail is rocketmail, again, see wikipedia.

Hotmail started on solaris, then migrated to windows. Again,
wikipedia has the history, and it's not pretty.

I wouldn't expect any to use Oracle as a db engine, but one never
knows. It's just mail, after all, what relations would users want?
Mostly they'd be LIFOing or FIFOing and forgetting. So yeah, in this
case deletion would be a primary determinant in design, and if they
use a database, each user would have one or a small number of tables.

So, google is your friend, evil may they be.

jg
--
@home.com is bogus.
http://www.informationweek.com/news/...leID=217201334


Reply With Quote
  #14  
Old   
cimode@hotmail.com
 
Posts: n/a

Default Re: webmail user mailbox schema design? - 05-08-2009 , 08:45 AM



On 2 mai, 05:31, Michael Austin <maus... (AT) firstdbasource (DOT) com> wrote:
Quote:
Tegiri Nenashi wrote:
On May 1, 12:01 pm, Michael Austin
You, my friend have obviously only ever worked on very small databases..

Define "small"...
"Terradata" -- a spectacular name 10 years ago sounds pretty ordinary
today.

Terradata is a HARDWARE partitioning of data - at least the last time I
looked... My Oracle DB bigger than a Terradata in the same company *-
well, due "economic conditions and poor *WS performance" I am no longer
there.. oh well..

Exadata (New Oracle beast) is also a hardware partitioning+db
partitioning and maybe some indexing...

As for the definition of "small"
Medium is < 20TB
Small is < 1TB.
Tiny is < 500G.
IttyBitty is < 50G.

Then there is large and HUGE.
*

In a DW enviroment, you would be insane to not use partitions.

Ah, the one that stubbornly resist indexing. Why indexing by the
record time is such a bad idea? For example, why table Sales
partitioned by the TransactionTime column can't be index organized?

Who's stubborn? *Not I. Not only do I use partitioning on things of this
size I also index.



When you need to query with this much data, being able to restrict your
search to a single or even a couple of partitions - you WILL become a
believer in the use of partitions.

Again, from query access path perspective partitioning looks like
rudimentary indexing. Because everything that partition pruning does

One could argue that, but they would be wrong.

index range scan does better. Yet, implementation of this rather
unsophisticated idea of chopping one table into many is not stellar: I
have seen optimizer failing to do partition pruning about as often as
it failing to find an indexed access path.

While in some cases this is true, you have to learn to finesse the
optimizer. Some of the hints work some of the time, but not always.
Query "building" is not just throwing a bunch of tables together in
JOINS just because that is what is needed, in order to finesse the
optimizer successfully, one must give it something to work with. And to
do that, you have to figure out what it is looking for.

I have written about this before in this forum, but bears repeating.
Have you ever had a query that no matter what index you had or what hint
you had for a given column in the WHERE clause, the CBO just would not
pick that index - (you need to remember that "hints" are just that...
and the CBO can and in some cases DOES ignore them).

While at DEC and working with Rdb Regional Field Support, we had one
such case. We initially "fixed" the problem by duplicating that
statement... *Again, this may not work in all cases, but I have solved
similar problems in Oracle version 8/9 and 10 using this method...

Example (terse pseudo-code version):

select a,b,c from a,b,c
where *a.d = 'something'
and *a.b = 'somethingelse'
and b.a=a.a
and a.a=a.a

Index is on a.d but it just will not use it...

Example (terse "improved" pseudo-code version):

select a,b,c from a,b,c
where
* * *a.d = 'something'
and a.d = 'something'
and *a.b = 'somethingelse'
and b.a=a.a
and a.a=a.a

Then the CBO gives a higher ranking to a.d and says "Oh - you really
wanted to use that index". *The last time I used this, we had a query
that went from ~20-30 minutes to < 3.

BTW, the world lost a real genius at the passing of Gennady Anteshenkov
- the author of the CBO - although there have been lots of improvements
over the years, without his insight, it would not be where it is today.
While I frequently use partitionning to alleviate the unpractical
consequences of administering direct image systems, I would hardly
call partitionning an improvement on a fundamental perspective.

Aside from the ego battle on *who knows best* or *who has dealt with
the biggest db*, I do not understand the fundamental interest in
opposing the subject of *logical* access optimization and the subject
of partitionning, a matter exclusively related to direct image system
physical layer, on the same standpoint.

This SQL-DBMS only debate simply ignores the principle of independence
between the two layers dictated by RM. In other words, partitionning
is simply a limitation imposed by direct image systems and so are
current indexing schemes.

I would find it more interesting to discuss how building a TRDBMS
would make the need of partitionning data physical representations an
unneeded feature.

My two cents...

Regards...


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.