Quote:
I currently have over 100,000 active users, so the number of
directories has become a bit cumbersome. My server has a limit of
|
30,000 subdirectories per directory (which I think is a Linux
limitation), so I had to create a patch to create new directories
for new users, then a way for people to determine which path stores
the data for their PM recipient. WAY overcomplicated!
It is a fairly common technique to use directories of the form
%c/%c/%s
where the two %c's are letters from the user's name (e.g. first and second, or
first letter of first name and first letter of last name, or some other
formula. Be sure to deal with what those two letters are should someone
have a single-character last name.) , and the %s represents the full name.
It works even better with numbers, with the two higher-level directories
being the *LAST* two digits of the overall number. Why? It distributes
the user directories fairly evenly under the higher-level directories.
Whereas with names, there may be nothing in the Q/Z directory and
lots in the S/M directory (such as all the "Smiths").
Quote:
username
folder (name of the folder)
status (boolean, read or unread)
timestamp
message
attachment (delimited list that links to an attachment)
There would be an index on the username, folder, status, and
timestamp. It would realistically have the same username listed a
|
bunch of times; if someone has 1,000 messages in all of their folders
combined, then there would be 1,000 rows with the same username.
It might be easier to introduce the concept of a "user id" here.
It's a number, and replaces the user name in the message table, and
is added to the user table. You look up the username in the user
table to get the user number, then look up the user number to get
to the messages. Why? Looking up by numeric index (is a 32-bit
integer enough for all the users? Is a limit of 2 billion or 4
billion users a problem?) is faster and usually takes less storage
than strings, especially in the message table.
MySQL will use only one index at a time for a query. I suggest that
you need compound indexes like (username, folder), (username, status),
and (username, timestamp) for most operations involving a single user
looking at his own messages. Or perhaps (username, folder, status)
and (username, folder, timestamp).
Quote:
Realistically, this table would have no less than a million rows,
and could easily grow to several million within a year or two.
|
Quote:
In addition, the current "user" table would have a new column
that would be a delimited list of the folders created for the
|
account. This COULD be removed and just search for all messages
under a username, and add the folder names to an array, but then
it wouldn't list folders that are temporarily empty.
Having a table with a delimited list of something doesn't generally
work very well. I'd suggest a table with the fields userid and
folder, with multiple entries for multiple folders. It's much easier
to figure out whether a user has a specific-named folder without
having to parse through the delimiters.
Quote:
What do you guys think? Is this a smart structure for a database
like this, or do you foresee pitfalls that would suggest a different
|
structure?