BDB infinite loop when opening dbm? -
03-23-2006
, 08:55 PM
This morning, one of our systems administrators asked me to look
at a daemon we'd written which utilizes berkeley DB.
The daemon was stuck in an infinite loop on start-up. No threads
had been started, this was entirely a single process. No other
processes access this DB. There were no other instances of this
daemon running, either.
What I found was the machine endlessly cycling:
] stat64("./var/db/svcrgy/__db.localSvcList.dbm", {st_mode=S_IFREG|0660, st_size=0, ...}) = 0
] open("./var/db/svcrgy/__db.localSvcList.dbm", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0660) = -1 EEXIST (File exists)
] open("./var/db/svcrgy/__db.localSvcList.dbm", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0660) = -1 EEXIST (File exists)
] open("./var/db/svcrgy/__db.localSvcList.dbm", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0660) = -1 EEXIST (File exists)
] stat64("./var/db/svcrgy/localSvcList.dbm", 0xdfffe0d0) = -1 ENOENT (No such file or directory)
] stat64("./var/db/svcrgy/__db.localSvcList.dbm", {st_mode=S_IFREG|0660, st_size=0, ...}) = 0
] open("./var/db/svcrgy/__db.localSvcList.dbm", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0660) = -1 EEXIST (File exists)
] open("./var/db/svcrgy/__db.localSvcList.dbm", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0660) = -1 EEXIST (File exists)
] open("./var/db/svcrgy/__db.localSvcList.dbm", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0660) = -1 EEXIST (File exists)
] stat64("./var/db/svcrgy/localSvcList.dbm", 0xdfffe0d0) = -1 ENOENT (No such file or directory)
Indeed, __db.localSvcList.dbm DID exist, and was zero bytes in
length. localSvcList.dbm did not exist.
I removed the __db.localSvcList.dbm file, and the system immediately
became happy, created localSvcList.dbm and continued to function
normally. The __db.localSvcList.dbm file never returned.
Pardon my python, but here's the code which opens a DB file:
] self.dbTxnFlags=DB_INIT_MPOOL|DB_INIT_LOCK|DB_INIT _LOG|DB_INIT_TXN
] self.dbEnvFlags = self.dbTxnFlags|DB_CREATE|DB_THREAD|DB_PRIVATE
] self.dbFileFlags = DB_CREATE|DB_THREAD|DB_AUTO_COMMIT
]
] if (not os.path.isdir(self.dbPath)):
] raise camBDBException("DB directory does not exist.")
]
] if (os.path.isfile("%s/%s" % (self.dbPath, self.dbFilePath))):
] try:
] recoverEnv = bsddb.db.DBEnv()
] recoverEnv.open(self.dbPath, self.dbEnvFlags|DB_RECOVER)
] recoverEnv.close()
] except:
] (err, why, tb) = sys.exc_info()
] raise camBDBException("Recovery failed: %s, %s" % (err, why))
]
] try:
] self.bdbEnv = bsddb.db.DBEnv()
] self.bdbEnv.set_lk_detect(DB_LOCK_OLDEST)
] self.bdbEnv.set_flags(DB_LOG_AUTOREMOVE, 1)
] self.bdbEnv.open(self.dbPath, self.dbEnvFlags)
]
] self.bdbDB = bsddb.db.DB(dbEnv=self.bdbEnv)
] self.bdbDB.open(self.dbFilePath, self.dbType, self.dbFileFlags)
] except:
] (err, why, tb) = sys.exc_info()
] raise camBDBException("DB open failed: %s, %s" % (err, why))
This is a linux machine:
Linux ws522068 2.4.22-y9 #8 SMP Mon Jan 19 14:50:14 PST 2004 i686 unknown
And I'm using: Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)
I checked the FAQ, but I didn't see anything relevant. Is there any
chance this is a known bug? I'm asking before I spend any significant
time reproducing the problem. This happens, at worst, rarely, since
we have some thousands of instances running all the time.
I'd like to take some action that would put our systems administrators'
minds at ease. Any advice?
--
__________________________________________________ _____________________
Marco E. Nicosia | http://www.escape.org/~marco/ | marco (AT) escape (DOT) org |