![]() | |
![]() |
| | Thread Tools | Display Modes |
#11
| |||
| |||
|
|
please take a look at my source code, 1024*128 items have been inserted. if there is only one item in the db, why the db file is so large? |
#12
| |||
| |||
|
|
likun.navipal (AT) gmail (DOT) com wrote: please take a look at my source code, 1024*128 items have been inserted. if there is only one item in the db, why the db file is so large? Oops, my mistake: you're using subdatabases. So the single 'item' shown by db_stat -d is the subdatabase in which all your data is actually held. Unless you have a particular reason for using subdatabases (bundling of multiple indexes, etc) I would recommend passing NULL as the 'database' argument to DB->open(). Subdatabases add a couple pages of overhead of their own, plus the extra complexity. For now, you can get the stats for the _real_ B-tree using both the -d option and the -s option. The later should be supplied the name of the subdatabase, as stored in the 'table_name' variable in your code. Since you're using a real environment you'll also want to specify the path to the environment home using the -h option, ala: db_stat -h db_home -d table_file_name -s table_name Btw, I hope your code was intended as a sample only. It's obviously incomplete---several variable declarations are missing---but if you really did leave out the DB->close() and DBENV->close() calls then you're going to have all sorts of problems. The concurrent data store version enabled by the DB_INIT_CDB does _not_ guarantee recoverability after an unclean close. Anyway, when I take your code, add the missing bits, run it with the default data size of 1024, and then do the db_stat above on the result, I get: Fri Jun 2 00:09:45 2006 Local time 53162 Btree magic number 9 Btree version number Little-endian Byte order multiple-databases Flags 2 Minimum keys per-page 4096 Underlying database page size 3 Number of levels in the tree 131072 Number of unique keys in the tree 131072 Number of data items in the tree 30 Number of tree internal pages 56764 Number of bytes free in tree internal pages (53% ff) 2157 Number of tree leaf pages 4064398 Number of bytes free in tree leaf pages (53% ff) 0 Number of tree duplicate pages 0 Number of bytes free in tree duplicate pages (0% ff) 131072 Number of tree overflow pages 399M Number of bytes free in tree overflow pages (25% ff) 0 Number of empty pages 0 Number of pages on the free list So, a 53% fill-factor for the tree itself and 25% for the overflow pages. That makes sense: the low fill-factor for the tree is caused by the out-of-order insertion, while the low fill-factor for the overflow pages is a direct result of the 4kB page size with 1kB items. Indeed, for 4kB pages, data items of 1kB are pretty much pessimal: if they were smaller they wouldn't be put on overflow pages and they could grow to almost 4kB in size without using any additional file space. We can confirm the "out-of-order causes 53% ff" deduction easily enough by inserting the entries in order by changing the sprintf() format to "aaaa_key_%06d". The average key length will actually increase with that, but when we run it again and check db_stat, the relevant lines now show: 10 Number of tree internal pages 5368 Number of bytes free in tree internal pages (86% ff) 1171 Number of tree leaf pages 47378 Number of bytes free in tree leaf pages (99% ff) Yep, 1/3 the internal pages and 1/2 the leaf pages compared to the out-of-order insertion, despite the larger average key size. Now, what to do about the overflow pages bit. Well, increasing the page size used would permit all the values to go on the primary pages where they would be packed together. So, let's increase the page size from 4kB to 16kB using DB->set_pagesize() and see what happens: Fri Jun 2 01:15:59 2006 Local time 53162 Btree magic number 9 Btree version number Little-endian Byte order multiple-databases Flags 2 Minimum keys per-page 16384 Underlying database page size 3 Number of levels in the tree 131072 Number of unique keys in the tree 131072 Number of data items in the tree 19 Number of tree internal pages 29476 Number of bytes free in tree internal pages (90% ff) 9363 Number of tree leaf pages 15M Number of bytes free in tree leaf pages (90% ff) 0 Number of tree duplicate pages 0 Number of bytes free in tree duplicate pages (0% ff) 0 Number of tree overflow pages 0 Number of bytes free in tree overflow pages (0% ff) 0 Number of empty pages 0 Number of pages on the free list Poof! No more overflow pages with their low fill-factor and the file was only 146.64MB compared to the 516.625MB file of the original 4kB page, out-of-order insertion version. Philip Guenther |
![]() |
| Thread Tools | |
| Display Modes | |
| |