dbTalk Databases Forums  

Automatically archive older rows?

comp.databases.mysql comp.databases.mysql


Discuss Automatically archive older rows? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jason C
 
Posts: n/a

Default Automatically archive older rows? - 12-12-2011 , 11:15 PM






This is for a table that holds message board posts.

I was planning to set up an `archive` table (identical to the `posts` table), and then set up a cron to automatically copy all rows "WHERE postdate < 20110912235959" (11:59:59pm, 3months ago) to `archive`, then delete those same rows from `posts`. This way, `posts` would stay nice and small.

The only problem in doing this is if the user reads a thread that includes posts that are more than 90 days old. These are fairly rare, but do exist. Because of these, I would need to modify my program to recognize when a thread includes those older posts, then make it JOIN appropriately.

Before I go through all of that work, I'm curious if there's an easier way to accomplish the same thing (or similar)? I had read up on partitions, andI could write the cron to run every 90 days and just create a partition first, but would MySQL automatically query partitions when they're needed?

Reply With Quote
  #2  
Old   
Tony Mountifield
 
Posts: n/a

Default Re: Automatically archive older rows? - 12-13-2011 , 04:13 AM






In article <1961950.1243.1323753318718.JavaMail.geo-discussion-forums@yqfm20>,
Jason C <comp.databases.mysql (AT) googlegroups (DOT) com> wrote:
Quote:
This is for a table that holds message board posts.

I was planning to set up an `archive` table (identical to the `posts` table), and then set
up a cron to automatically copy all rows "WHERE postdate < 20110912235959" (11:59:59pm,
3months ago) to `archive`, then delete those same rows from `posts`. This way, `posts` would
stay nice and small.

The only problem in doing this is if the user reads a thread that includes posts that are
more than 90 days old. These are fairly rare, but do exist. Because of these, I would need
to modify my program to recognize when a thread includes those older posts, then make it
JOIN appropriately.

Before I go through all of that work, I'm curious if there's an easier way to accomplish the
same thing (or similar)? I had read up on partitions, and I could write the cron to run
every 90 days and just create a partition first, but would MySQL automatically query
partitions when they're needed?
Jason, why do you need to keep `posts` "nice and small"? If the `posts` table
is correctly indexed, then access to the required records will still be fast
even if the table becomes "nasty and big"!

Moving records from `posts` to `archive` is just unnecessary effort to create
a problem that you then need to solve. The correct solution is not to do it
in the first place.

Now, if your user interface wants to present older posts to the users as
"archived", then all you need to do is to add an archived flag to `posts`
and update it as required. But if the criteria for archival is just a fixed
age of the post, you don't even need such a flag, and can just calculate
the archived status at run time based on the postdate, e.g.

SELECT ...,(postdate < DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)) AS archived,...

or

WHERE postdate < DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)

if you are just selecting archived posts (use >= if selecting only non-archived).

Cheers
Tony
--
Tony Mountifield
Work: tony (AT) softins (DOT) co.uk - http://www.softins.co.uk
Play: tony (AT) mountifield (DOT) org - http://tony.mountifield.org

Reply With Quote
  #3  
Old   
Doug Miller
 
Posts: n/a

Default Re: Automatically archive older rows? - 12-13-2011 , 05:55 AM



On 12/13/2011 12:15 AM, Jason C wrote:
Quote:
This is for a table that holds message board posts.

I was planning to set up an `archive` table (identical to the `posts` table), and then set up a cron to automatically copy all rows "WHERE postdate< 20110912235959" (11:59:59pm, 3months ago) to `archive`, then delete those same rows from `posts`. This way, `posts` would stay nice and small.

Why? What is the point? Clearly the size of 'posts' + the size of
'archive', if you do this, will be the same as the size of 'posts' alone
if you don't. So you're not saving any storage space *at all*.

Meanwhile, you're creating marginally more work for yourself in creating
the archive...

Quote:
The only problem in doing this is if the user reads a thread that includes posts that are more than 90 days old. These are fairly rare, but do exist. Because of these, I would need to modify my program to recognize when a thread includes those older posts, then make it JOIN appropriately.

.... and creating a boatload more work for yourself in accessing it.

Quote:
Before I go through all of that work, I'm curious if there's an easier way to accomplish the same thing (or similar)? I had read up on partitions, and I could write the cron to run every 90 days and just create a partition first, but would MySQL automatically query partitions when they're needed?
Of course there is: leave it alone.

Think this through. What problem are you trying to solve?

If the problem you're trying to solve is that as the 'posts' table grows
larger and larger, retrieval of data from it gets slower and slower,
then the solution is to index the table properly and/or write more
efficient queries.

If the "problem" you're trying to solve is that the table is bigger than
you think it "should" be... relax. Don't worry about it. Big tables
don't bother the DBMS; they shouldn't bother you.

If the "problem" is something else, describe what you think it is.

Reply With Quote
  #4  
Old   
Álvaro G. Vicario
 
Posts: n/a

Default Re: Automatically archive older rows? - 12-13-2011 , 06:43 AM



El 13/12/2011 6:15, Jason C escribió/wrote:
Quote:
This is for a table that holds message board posts.

I was planning to set up an `archive` table (identical to the `posts`
table), and then set up a cron to automatically copy all rows "WHERE
postdate< 20110912235959" (11:59:59pm, 3months ago) to `archive`,
then delete those same rows from `posts`. This way, `posts` would
stay nice and small.

The only problem in doing this is if the user reads a thread that
includes posts that are more than 90 days old. These are fairly rare,
but do exist. Because of these, I would need to modify my program to
recognize when a thread includes those older posts, then make it JOIN
appropriately.

Before I go through all of that work, I'm curious if there's an
easier way to accomplish the same thing (or similar)? I had read up
on partitions, and I could write the cron to run every 90 days and
just create a partition first, but would MySQL automatically query
partitions when they're needed?
I don't know what you read about partitions but they do the opposite
think: they allow you to keep all your historical data in a single
table. Partitions are a tool to fine-grain storage details for very
large tables, e.g., split our single table into different disk partitions.

http://dev.mysql.com/doc/refman/5.5/...titioning.html


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--

Reply With Quote
  #5  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: Automatically archive older rows? - 12-13-2011 , 07:22 AM



On Mon, 12 Dec 2011 21:15:18 -0800 (PST), Jason C wrote:

Quote:
This is for a table that holds message board posts.

I was planning to set up an `archive` table (identical to the `posts`
table), and then set up a cron to automatically copy all rows "WHERE
postdate < 20110912235959" (11:59:59pm, 3months ago) to `archive`,
then delete those same rows from `posts`. This way, `posts` would stay
nice and small.

The only problem in doing this is if the user reads a thread that
includes posts that are more than 90 days old. These are fairly rare,
but do exist. Because of these, I would need to modify my program to
recognize when a thread includes those older posts, then make it JOIN
appropriately.

Before I go through all of that work, I'm curious if there's an
easier way to accomplish the same thing (or similar)? I had read up
on partitions, and I could write the cron to run every 90 days and
just create a partition first, but would MySQL automatically query
partitions when they're needed?
Don't bother. Seriously, you're complicating something for no particular
benefit, it'll only make things harder to program and harder to find,
and even if you're absolutely determined to do a lot of unnecessary
work, the right way to handle it is to periodically scan all the message
threads, find the ones in posts that have ALL of the messages in them
older than three months, and archive the whole thread at once. That way
active things, which are what people are likely to read anyway, all stay
in your active pool.

--
68. I will spare someone who saved my life sometime in the past. This is
only reasonable as it encourages others to do so. However, the offer
is good one time only. If they want me to spare them again, they'd
better save my life again. --Peter Anspach's Evil Overlord list

Reply With Quote
  #6  
Old   
Jason C
 
Posts: n/a

Default Re: Automatically archive older rows? - 12-13-2011 , 04:00 PM



On Tuesday, December 13, 2011 8:22:09 AM UTC-5, Peter H. Coffin wrote:
Quote:
Don't bother. Seriously, you're complicating something for no particular
benefit, it'll only make things harder to program and harder to find,
and even if you're absolutely determined to do a lot of unnecessary
work, the right way to handle it is to periodically scan all the message
threads, find the ones in posts that have ALL of the messages in them
older than three months, and archive the whole thread at once. That way
active things, which are what people are likely to read anyway, all stay
in your active pool.
That's not a bad plan, and doable.

I'm really not trying to solve a problem, just trying to do some "prematureoptimization", as Jerry likes to say :-)

I currently have 38 `posts` tables, for 38 separate forums. I'm planning tomerge these in to one table for the sake of making moderating easier, withan added column to designate to which forum the post belongs. So, where mylargest `posts` table currently has 1.4 million rows, this will increase it to around 10 million, with a potential growth of several million a year.

I figured that reducing the table from 10 million rows to closer to 750,000would help speed things up a little, and help prevent future slow queries when it becomes 15, 20, or 30 million rows.

Currently, in the table with 1.4 million rows, a normal query only takes 0.0005 seconds, so this is probably moot, anyway. I mean, it could be 1,000 times slower, and still not be noticeable to the regular user. But, this is my typical slow season, so I just figured that it's a good time to mess around.

With that in mind, do you still think that minimizing the table size is pointless?

Reply With Quote
  #7  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: Automatically archive older rows? - 12-13-2011 , 08:37 PM



On Tue, 13 Dec 2011 14:00:12 -0800 (PST), Jason C wrote:

Quote:
On Tuesday, December 13, 2011 8:22:09 AM UTC-5, Peter H. Coffin wrote:

Don't bother. Seriously, you're complicating something for no
particular benefit, it'll only make things harder to program and
harder to find, and even if you're absolutely determined to do a lot
of unnecessary work, the right way to handle it is to periodically
scan all the message threads, find the ones in posts that have ALL of
the messages in them older than three months, and archive the whole
thread at once. That way active things, which are what people are
likely to read anyway, all stay in your active pool.

That's not a bad plan, and doable.

I'm really not trying to solve a problem, just trying to do some
"premature optimization", as Jerry likes to say :-)
Don't do that.

Quote:
I currently have 38 `posts` tables, for 38 separate forums. I'm
planning to merge these in to one table for the sake of making
moderating easier, with an added column to designate to which forum
the post belongs. So, where my largest `posts` table currently has
1.4 million rows, this will increase it to around 10 million, with a
potential growth of several million a year.

I figured that reducing the table from 10 million rows to closer to
750,000 would help speed things up a little, and help prevent future
slow queries when it becomes 15, 20, or 30 million rows.

Currently, in the table with 1.4 million rows, a normal query only
takes 0.0005 seconds, so this is probably moot, anyway. I mean, it
could be 1,000 times slower, and still not be noticeable to the
regular user. But, this is my typical slow season, so I just figured
that it's a good time to mess around.

With that in mind, do you still think that minimizing the table size
is pointless?
Yes. Because querying the table, if you're getting results back that
fast isn't going to take 1,000 times as long. You've clearly got at
least reasonable indexes built on the thing to return in that amount of
time. The amount of time the query took didn't increase 1,000-fold each
of the PREVIOUS times you increased the row count eight-fold, did it?
The two commonest storage engines used both have binary indexes, which
means finding rows takes worst case log2(N) probes. If that makes no
sense, you can think of it as (and yes, pedants, this is VERY roughly)
for every doubling of the number of rows you have, it takes one more
check of the index. Right now, it takes about 20 checks of the index
to find the right rown in 1.4 million, which happens in that 0.0005
seconds. Going to 10 million will take maybe 23 checks of the index.
Which means you can expect that it might take up to 0.0007 seconds to
find the row. And there's ways that mysql can cheat that one too.

--
39. If I absolutely must ride into battle, I will certainly not ride at
the forefront of my Legions of Terror, nor will I seek out my
opposite number among his army.
--Peter Anspach's list of things to do as an Evil Overlord

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.