Extremely slow when paging occurs -
01-06-2006
, 10:07 PM
Thanks for your advices. Patrick's suggestion worked for my sorted
input case. The elapse time went down significantly to 15 Minutes for
50M rows.
However, for random input key records, it slows down again.
Any suggestion on how to figure out the cache size to fill all dirty
pages?
If I have multiple DB writers writing to different DBs in one ENV
concurrently, any cache tuning suggestion?
My test case:
I have a file with 10 Million records. Each record has 10 columns with
up to 20-byte characters. Values are generated by rand(). Assume all
values are different.
I want to write a program to find out distinct values for each column
and their count. Here is what I did:
- Open a env with cache 1.5G:
G_BDB_ENV = new DbEnv(0);
string dbHome = "G:/temp/bdb";
int ret = G_BDB_ENV->set_cachesize(1, 400*1024*1024, 1);
ret = G_BDB_ENV->open(dbHome.c_str(),
DB_INIT_CDB|DB_PRIVATE|DB_INIT_MPOOL|DB_CREATE|DB_ THREAD, 0);
- Open 10 Btree DBs, one for each column:
bdb = new Db(G_BDB_ENV, 0);
string fileName = string(RepConn::generateGUID()) + ".db";
bdb->set_pagesize(32*1024);
bdb->open(NULL, fileName.c_str(), NULL, DB_BTREE, DB_CREATE |
DB_THREAD, 0);
bdb->cursor(NULL, &cursor, DB_WRITECURSOR);
- For each value, look it up in its DB, if it exists, update the
record
with its count+1. otherwise, insert a record with count =1.
Dbt key, data;
I4 row_count = 0;
memset(&key, 0, sizeof(DBT));
memset(&data, 0, sizeof(DBT));
key.set_data( value+4);
key.set_size( *(I4 *)value);
data.set_data(&row_count);
data.set_size(sizeof(I4));
data.set_ulen(sizeof(I4));
data.set_flags(DB_DBT_USERMEM);
int ret = m_cursor->get(&key, &data, DB_SET );
// 1.3 if not found insert a new record
if (ret != 0)
{
row_count = 1;
m_cursor->put(&key, &data, DB_KEYLAST);
m_distinct_count ++;
}
// 1.4 if found increase the row count
else
{
row_count++;
m_cursor->put(&key, &data, DB_CURRENT);
}
I have 4 threads running on my 4 CPU machine with 4G RAM.
The performance was acceptable until it started to page. Then it is
very slow. Easily run for 24 hours.
Did I do anything wrong here?
Thanks
David |