dbTalk Databases Forums  

File system fragmentation

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss File system fragmentation in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Florian Weimer
 
Posts: n/a

Default File system fragmentation - 07-16-2008 , 02:39 PM






I tested this again after a couple of years, and the behavior doesn't
seem to have changed: If a Berkeley DB table is written using TDS with a
reasonably sized cache, data is written from the cache to the file
system in what a appears to be a random fashion. Apparently, a lot of
holes are created, which are then filled. This degrades file system
performance and makes hot backups somewhat difficult (because the read
performance is a fraction of that what can actually achieved).

Is there still no way to preallocate the contents of B-tree files?

(Without TDS, the problem disappears, it seems to be related TDS or the
cache size.)

Reply With Quote
  #2  
Old   
michael.cahill@gmail.com
 
Posts: n/a

Default Re: File system fragmentation - 07-21-2008 , 07:44 AM






Quote:
(Without TDS, the problem disappears, it seems to be related TDS or the
cache size.)
I believe it's purely related to cache size, not to TDS. The issue is
Berkeley DB's approximation to LRU in the cache. We don't maintain a
pure LRU list because it's a concurrency bottleneck.

You could try defining HAVE_FILESYSTEM_NOTZERO in db_config.h, and
changing the __os_fs_notzero function to return 1. This won't change
the order in which pages are flushed from the cache, but will ensure
that the file grows without holes.

Please let us know if this does make a difference.

Regards,
Michael Cahill, Oracle.


Reply With Quote
  #3  
Old   
michael.cahill@gmail.com
 
Posts: n/a

Default Re: File system fragmentation - 07-21-2008 , 07:44 AM



Quote:
(Without TDS, the problem disappears, it seems to be related TDS or the
cache size.)
I believe it's purely related to cache size, not to TDS. The issue is
Berkeley DB's approximation to LRU in the cache. We don't maintain a
pure LRU list because it's a concurrency bottleneck.

You could try defining HAVE_FILESYSTEM_NOTZERO in db_config.h, and
changing the __os_fs_notzero function to return 1. This won't change
the order in which pages are flushed from the cache, but will ensure
that the file grows without holes.

Please let us know if this does make a difference.

Regards,
Michael Cahill, Oracle.


Reply With Quote
  #4  
Old   
michael.cahill@gmail.com
 
Posts: n/a

Default Re: File system fragmentation - 07-21-2008 , 07:44 AM



Quote:
(Without TDS, the problem disappears, it seems to be related TDS or the
cache size.)
I believe it's purely related to cache size, not to TDS. The issue is
Berkeley DB's approximation to LRU in the cache. We don't maintain a
pure LRU list because it's a concurrency bottleneck.

You could try defining HAVE_FILESYSTEM_NOTZERO in db_config.h, and
changing the __os_fs_notzero function to return 1. This won't change
the order in which pages are flushed from the cache, but will ensure
that the file grows without holes.

Please let us know if this does make a difference.

Regards,
Michael Cahill, Oracle.


Reply With Quote
  #5  
Old   
Florian Weimer
 
Posts: n/a

Default Re: File system fragmentation - 08-02-2008 , 03:02 PM



* michael cahill:

Quote:
(Without TDS, the problem disappears, it seems to be related TDS or the
cache size.)

I believe it's purely related to cache size, not to TDS. The issue is
Berkeley DB's approximation to LRU in the cache. We don't maintain a
pure LRU list because it's a concurrency bottleneck.
Couldn't you write all previous dirty pages (in file order) when you
extend a database?

Quote:
You could try defining HAVE_FILESYSTEM_NOTZERO in db_config.h, and
changing the __os_fs_notzero function to return 1. This won't change
the order in which pages are flushed from the cache, but will ensure
that the file grows without holes.

Please let us know if this does make a difference.
The results are mixed. When using DS with a small cache file, the
number of fragments is significantly reduced. However, loading the
database takes longer because there are now intervening fdatasync calls
(and lseek/write is used instead of pwrite). With a large cache, I
don't see much difference. Coincidentally, the number of fragments is
almost the same as in the -DHAVE_FILESYSTEM_NOTZERO case.

(I tested this with Berkeley DB 4.7.25 on GNU/Linux.)


Reply With Quote
  #6  
Old   
Florian Weimer
 
Posts: n/a

Default Re: File system fragmentation - 08-02-2008 , 03:02 PM



* michael cahill:

Quote:
(Without TDS, the problem disappears, it seems to be related TDS or the
cache size.)

I believe it's purely related to cache size, not to TDS. The issue is
Berkeley DB's approximation to LRU in the cache. We don't maintain a
pure LRU list because it's a concurrency bottleneck.
Couldn't you write all previous dirty pages (in file order) when you
extend a database?

Quote:
You could try defining HAVE_FILESYSTEM_NOTZERO in db_config.h, and
changing the __os_fs_notzero function to return 1. This won't change
the order in which pages are flushed from the cache, but will ensure
that the file grows without holes.

Please let us know if this does make a difference.
The results are mixed. When using DS with a small cache file, the
number of fragments is significantly reduced. However, loading the
database takes longer because there are now intervening fdatasync calls
(and lseek/write is used instead of pwrite). With a large cache, I
don't see much difference. Coincidentally, the number of fragments is
almost the same as in the -DHAVE_FILESYSTEM_NOTZERO case.

(I tested this with Berkeley DB 4.7.25 on GNU/Linux.)


Reply With Quote
  #7  
Old   
Florian Weimer
 
Posts: n/a

Default Re: File system fragmentation - 08-02-2008 , 03:02 PM



* michael cahill:

Quote:
(Without TDS, the problem disappears, it seems to be related TDS or the
cache size.)

I believe it's purely related to cache size, not to TDS. The issue is
Berkeley DB's approximation to LRU in the cache. We don't maintain a
pure LRU list because it's a concurrency bottleneck.
Couldn't you write all previous dirty pages (in file order) when you
extend a database?

Quote:
You could try defining HAVE_FILESYSTEM_NOTZERO in db_config.h, and
changing the __os_fs_notzero function to return 1. This won't change
the order in which pages are flushed from the cache, but will ensure
that the file grows without holes.

Please let us know if this does make a difference.
The results are mixed. When using DS with a small cache file, the
number of fragments is significantly reduced. However, loading the
database takes longer because there are now intervening fdatasync calls
(and lseek/write is used instead of pwrite). With a large cache, I
don't see much difference. Coincidentally, the number of fragments is
almost the same as in the -DHAVE_FILESYSTEM_NOTZERO case.

(I tested this with Berkeley DB 4.7.25 on GNU/Linux.)


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.