dbTalk Databases Forums  

Concurrent access to a DB by independent processes

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Concurrent access to a DB by independent processes in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Olga Segal
 
Posts: n/a

Default Concurrent access to a DB by independent processes - 09-15-2003 , 03:40 PM






Hello,
We are using Berkley DB 4.0.14, and running four independent processes
(written in C), accessing one shared index file. The first process to
start creates and opens an environment and uses a file to notify all
following processes that they need to join the environment, and not to
create it. The index file is opened in the environment by each
process, with read/write access. To read, we use simple "get"
function, to write we use transactions. There must be something we are
doing wrong, because after running for variable duration of time, some
of our processes produce either a segmentation fault or a bus error on
such internal Berkley functions as get_lockv or db_get, but one or two
out of four finish successfully. And sometime processes seem to lock
each other.
The Berkley DB reference recommends using one parent process which
would create several threads to run in parallel, but that is not
suitable for our project.
Did anyone implement anything similar?
Or does 4.0.14 version support such functionality at all and we need
to upgrade?
I have a small-scale program doing same thing our big program does,
and any help with making it work would be greatly appreciated.
olga_segal (AT) yahoo (DOT) com

Reply With Quote
  #2  
Old   
Hans-Bernhard Broeker
 
Posts: n/a

Default Re: Concurrent access to a DB by independent processes - 09-16-2003 , 07:28 AM






Olga Segal <olga_segal (AT) yahoo (DOT) com> wrote:
Quote:
Hello,
We are using Berkley DB 4.0.14, and running four independent processes
(written in C), accessing one shared index file. The first process to
start creates and opens an environment and uses a file to notify all
following processes that they need to join the environment, and not to
create it.
Sounds like a broken plan for two reasons:

1) Race condition. You're "checking -- deciding -- acting". By the
time process A has decided that it has to create the environment, process
B might already have done so.

2) Needless double work. The DB environment already is represented by
a file --- there's no particular need to use another one just to
signal its state.

Quote:
The index file is opened in the environment by each process, with
read/write access. To read, we use simple "get" function, to write
we use transactions.
I doubt that can work. If you're going to use transactions, you'll
have to use transactions for everything, including read accesses.
Otherwise odds are the DB content changes under the feet of an ongoing
read access.

--
Hans-Bernhard Broeker (broeker (AT) physik (DOT) rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.


Reply With Quote
  #3  
Old   
Keith Bostic
 
Posts: n/a

Default Re: Concurrent access to a DB by independent processes - 09-16-2003 , 07:54 AM



olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309151240.28a759a7 (AT) posting (DOT) google.com>...

Hi, my name is Keith Bostic and I'm with Sleepycat Software.

Quote:
We are using Berkley DB 4.0.14, and running four independent processes
(written in C), accessing one shared index file. The first process to
start creates and opens an environment and uses a file to notify all
following processes that they need to join the environment, and not to
create it. The index file is opened in the environment by each
process, with read/write access. To read, we use simple "get"
function, to write we use transactions.
Berkeley DB can absolutely support this architecture. Can you post
your test program?

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com


Reply With Quote
  #4  
Old   
Michael Ubell
 
Posts: n/a

Default Re: Concurrent access to a DB by independent processes - 09-16-2003 , 09:41 AM




Hans-Bernhard Broeker wrote:

Quote:
The index file is opened in the environment by each process, with
read/write access. To read, we use simple "get" function, to write
we use transactions.


I doubt that can work. If you're going to use transactions, you'll
have to use transactions for everything, including read accesses.
Otherwise odds are the DB content changes under the feet of an ongoing
read access.

Running reads without transactions does work. You do not get repeatable
reads, that is the data can change each time you read it, but you are protected
from seeing uncommited updates.

Michael Ubell
Sleepycat Software.



Reply With Quote
  #5  
Old   
Olga Segal
 
Posts: n/a

Default Re: Concurrent access to a DB by independent processes - 09-16-2003 , 03:27 PM



bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309160454.6c1c511f (AT) posting (DOT) google.com>...
Quote:
olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309151240.28a759a7 (AT) posting (DOT) google.com>...

Hi, my name is Keith Bostic and I'm with Sleepycat Software.

We are using Berkley DB 4.0.14, and running four independent processes
(written in C), accessing one shared index file. The first process to
start creates and opens an environment and uses a file to notify all
following processes that they need to join the environment, and not to
create it. The index file is opened in the environment by each
process, with read/write access. To read, we use simple "get"
function, to write we use transactions.

Berkeley DB can absolutely support this architecture. Can you post
your test program?
Thank you all for prompt responds.
Keith,
the test database used by the program was created by another C
program, using db->put(), it has 10 entries, the key is a string of 3
chars, from "000" to "009", and the data is a char 'A'.
Below is a simple program that accepts a key as an argument and
changes the data corresponding to that key from 'A' to 'W'.
It goes into a long-running loop and updates that one key-data and
displays the content of the file.
If that same program is started in another session, and ANOTHER key is
passed as a parameter (which should prevent both processes from
conflicting, right?),
one or both programs crash, but by looking at the output its clear
that before the crash they both updated appropriate keys and even saw
changes made by each other.
#include <stdio.h>
#include <stdlib.h>
#include <sys/errno.h>
#include <sys/stat.h>
#include <string.h>
#include <strings.h>
#include <signal.h>
#include <fcntl.h>
#include <db.h>
#include <errno.h>
#include <pthread.h>
#include <stdarg.h>
#include <unistd.h>

#define ENV_DIR "TEST"
#define TABLE_NAME "table.idx"
#define FLAG_DIR "_db_flag"
#define FLAG_DB "_file_flag"

void env_dir_create(void);
void env_open(DB_ENV **);
void update_key(char* key1);
void check_key(char* key);
void open_db(DB_ENV *dbenv, DB **dbp, char* name, int dups);
void display_db(void);
void close_db(DB_ENV *dbenv, DB *dbp, char* name);
void close_env(DB_ENV *dbenv);

DB_ENV *dbenv;
DB* table;

main(int argc, char* argv[])
{
char key1[10];

if (argc<2 || argc >2)
{
printf("Enter one key\n");
exit(1);
}
if (argc == 2)
strcpy(key1, argv[1]);

//creating a directory if does not exist
env_dir_create();

//Open/join the environment
env_open(&dbenv);

//open the database in this env
open_db(dbenv, &table, TABLE_NAME, 0);

char key_cp[10+1];
char data_cp[1+1];

//loop and update one key in the table
//and display the table
unsigned long int j;

//Display what's in the database in the beginning
display_db();

for (j=0; j<400000000;j++)
{
printf("----------- %d -------------\n", j);

update_key(key1);

display_db();

}//end for ;;

//close the database
close_db(dbenv, table, TABLE_NAME);

//close the environment
close_env(dbenv);
}

void close_db(DB_ENV *dbenv, DB *dbp, char* name)
{
int retc;
if ( (retc = dbp->close(dbp, 0)) != 0)
{
dbenv->err(dbenv, retc, "dbp->close %s", name);
exit(1);
}
}
void close_env(DB_ENV *dbenv)
{
int retc;
if ( (retc = dbenv->close(dbenv, 0)) != 0)
{
dbenv->err(dbenv, retc, "dbenv->close");
exit(1);
}
}

void display_db(void)
{
char key_cp[10+1];
int i;

for (i=0; i<10; i++)
{
sprintf(key_cp, "%03d", i);

check_key(key_cp);
}
}

void open_db(DB_ENV *dbenv, DB **dbp, char* name, int dups)
{
int retc;
DB *db;

//create the datanase handle
if ( (retc = db_create(&db, dbenv, 0)) != 0)
{
dbenv->err(dbenv, retc, "db_create");
exit(1);
}
//if needed, turn on duplicate data items
if (dups && (retc = db->set_flags(db, DB_DUP)) != 0)
{
dbenv->err(dbenv, retc, "db->set_flagsB_DUP");
exit(1);
}

struct stat sb;
if ( stat(FLAG_DB, &sb) == 0)
{
//flag exist, meaning the environment was already created by previous
process
//and the database was already created (if needed)
printf("FLAG_DB exist, Opening the db.\n");

//open a database in the environment
if ( (retc= db->open(db, name, NULL,
DB_BTREE,
0,
0)) != 0)
{
dbenv->err(dbenv, retc, "db->open: %s", name);
exit(1);
}
}
else
{
//flag is not created yet, meaning this is the first process.
//creating the env
printf("FLAG_DB DOES NOT exist, Creating and opening db\n");

if ( (retc= db->open(db, name, NULL,
DB_BTREE,
DB_CREATE|DB_THREAD,
0)) != 0)
{
dbenv->err(dbenv, retc, "db->open: %s", name);
exit(1);
}

//create a flag directory for next processes
if ( mkdir(FLAG_DB, S_IRWXU) !=0 )
{
printf("Error creating directory %s\n", FLAG_DB);
exit(1);
}
}//end else

//return the handle
*dbp = db;
}

void env_dir_create(void)
{
struct stat sb;
if ( stat(ENV_DIR, &sb) == 0)
{
printf("Directory exist\n");
return;
}
else
{
if ( mkdir(ENV_DIR, S_IRWXU) !=0 )
{
printf("Error creating directory %s\n", ENV_DIR);
exit(1);
}
}
}

void env_open(DB_ENV **dbenvp)
{
DB_ENV *dbenv;
int ret;

//create
if ( (ret=db_env_create(&dbenv, 0)) != 0 )
{
printf("Error creating the environment\n");
exit(1);
}

//set up error handling
dbenv->set_errpfx(dbenv, "test");

//do deadlock detection internally
if ( (ret = dbenv->set_lk_detect(dbenv, DB_LOCK_DEFAULT)) != 0)
{
dbenv->err(dbenv, ret, "set_lk_detect: DB_LOCK_DEFAULT");
exit(1);
}

//need to either create an environment (if this is the first
process)
//or join the existing environment (all following processes)
struct stat sb;
if ( stat(FLAG_DIR, &sb) == 0)
{
//flag exist, meaning the environment was already created by
previous process.
//need to join it
printf("FLAG_DIR exist, Joining the environment\n");

//open the enviroment: create if it does not exist
if ( (ret = dbenv->open(dbenv, ENV_DIR,
DB_JOINENV,
0)) != 0)
{
dbenv->err(dbenv, ret, "dbenv->open: %s", ENV_DIR);
exit(1);
}
//return;
}
else
{
//flag is not created yet, meaning this is the first process.
//creating the env
printf("FLAG_DIR DOES NOT exist, Creating the environment\n");
if ( (ret = dbenv->open(dbenv, ENV_DIR,
DB_INIT_TXN|DB_RECOVER|
DB_CREATE | DB_INIT_LOCK | DB_INIT_MPOOL|DB_THREAD,
0)) != 0)
{
dbenv->err(dbenv, ret, "dbenv->open: %s", ENV_DIR);
dbenv->close(dbenv, 0);
exit(1);
}

//creating a FLAG
//printf("Creating FLAG_FILE\n");
if ( mkdir(FLAG_DIR, S_IRWXU) !=0 )
{
printf("Error creating directory %s\n", FLAG_DIR);
exit(1);
}

}
*dbenvp = dbenv;
}

void update_key(char* key1_cp)
{
DBT key,data;
DB_TXN *tid;
int ret;

memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
key.data = (void*)key1_cp;
key.size = strlen(key1_cp)+1;

data.flags = DB_DBT_MALLOC;

if ((ret = table->get(table, NULL, &key, &data, 0)) == 0)
{
char temp_ac[sizeof(data.data)];
strcpy(temp_ac,data.data);
temp_ac[0] = 'W';
strcpy(data.data,temp_ac);
data.size = strlen(temp_ac)+1;
}
else if (ret == DB_LOCK_DEADLOCK)
{
printf("DB_LOCK_DEADLOCK\n");
return;
}
else if (ret == DB_NOTFOUND)
{
printf("DB_NOTFOUND for key = %s\n", key.data);
return;
}
else if (ret == DB_KEYEMPTY)
{
printf("DB_KEYEMPTY for key = %s\n", key.data);
return;
}
else if (ret == DB_KEYEXIST )
{
printf("DB_KEYEXIST for key = %s\n", key.data);
return;
}
else
{
printf("Key = %s not found\n", key.data);
return;
}

//transaction
for (;
{
if ( (ret = txn_begin(dbenv, NULL, &tid, 0)) != 0 )
{
dbenv->err(dbenv, ret, "txn_begin");
exit(1);
}

//update the key
switch (ret = table->put(table, tid, &key, &data, 0))
{
case 0:
//Success: commit the change.
if ( (ret = txn_commit(tid, 0)) !=0 )
{
dbenv->err(dbenv, ret, "txn_commit");
exit(1);
}

if (data.data != NULL)
{
free(data.data);
}
return;
break;
case DB_LOCK_DEADLOCK:
printf("DB_LOCK_DEADLOCK\n");
if ( (ret = txn_abort(tid))!=0 )
{
dbenv->err(dbenv, ret, "txn_abort");
exit(1);
}
break;
default:
//error
printf("DEFAULT\n");
dbenv->err(dbenv, ret, "dbc->get: %s", key.data);
exit(1);
break;
} //end switch

}//end for
if (data.data != NULL)
free(data.data);

return;
}

void check_key(char *key_cp)
{
DBT key,data;
DB_TXN *tid;
int ret;

//init
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
key.data = (void*)key_cp;
key.size = strlen(key_cp)+1;

data.flags = DB_DBT_MALLOC;


if ((table->get(table, NULL, &key, &data, 0)) == 0)
{
printf("Key = %s, Data = %s\n", key.data, data.data);
}
else if (ret == DB_LOCK_DEADLOCK)
{
printf("DB_LOCK_DEADLOCK\n");
}
else if (ret == DB_NOTFOUND)
{
printf("DB_NOTFOUND for key = %s\n", key.data);
}
else if (ret == DB_KEYEMPTY)
{
printf("DB_KEYEMPTY for key = %s\n", key.data);
}
else if (ret == DB_KEYEXIST )
{
printf("DB_KEYEXIST for key = %s\n", key.data);
}
else
{
printf("Key = %s not found\n", key.data);
}

if (data.data != NULL)
{
free(data.data);
}
return;
}

Quote:
Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com

Reply With Quote
  #6  
Old   
Keith Bostic
 
Posts: n/a

Default Re: Concurrent access to a DB by independent processes - 09-19-2003 , 09:35 AM



olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309161227.725ed6a9 (AT) posting (DOT) google.com>...
Quote:
bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309160454.6c1c511f (AT) posting (DOT) google.com>...
olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309151240.28a759a7 (AT) posting (DOT) google.com>...
If that same program is started in another session, and ANOTHER key is
passed as a parameter (which should prevent both processes from
conflicting, right?), one or both programs crash, but by looking at the
output its clear that before the crash they both updated appropriate
keys and even saw changes made by each other.
You've configured Berkeley DB for locking (DB_INIT_LOCK), so it
doesn't matter if the threads of control use a different key or
not. With locking configured, access to the database will be
serialized.

Anyway, I've just run your program with 7 different processes,
for about 40,000 iterations, without failure.

If you start with a clean database environment directory, and
start N copies of the program, how long does it take for you to
reproduce the failure? Are you ever interrupting the program
and restarting it before the failures occur?

Can you please send me a stack trace from one of the crashes?

With what hardware and operating system are you running your
tests?

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com


Reply With Quote
  #7  
Old   
Olga Segal
 
Posts: n/a

Default Re: Concurrent access to a DB by independent processes - 09-22-2003 , 11:04 AM



bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309190635.272e278e (AT) posting (DOT) google.com>...
Quote:
olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309161227.725ed6a9 (AT) posting (DOT) google.com>...
bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309160454.6c1c511f (AT) posting (DOT) google.com>...
olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309151240.28a759a7 (AT) posting (DOT) google.com>...
If that same program is started in another session, and ANOTHER key is
passed as a parameter (which should prevent both processes from
conflicting, right?), one or both programs crash, but by looking at the
output its clear that before the crash they both updated appropriate
keys and even saw changes made by each other.

You've configured Berkeley DB for locking (DB_INIT_LOCK), so it
doesn't matter if the threads of control use a different key or
not. With locking configured, access to the database will be
serialized.

Anyway, I've just run your program with 7 different processes,
for about 40,000 iterations, without failure.

If you start with a clean database environment directory, and
start N copies of the program, how long does it take for you to
reproduce the failure? Are you ever interrupting the program
and restarting it before the failures occur?

Can you please send me a stack trace from one of the crashes?

With what hardware and operating system are you running your
tests?
Keith,
thank you very much for your effort to help. Unfortunately some of our
hardware still isn't brought up after the Hurricane Isabel, and
I can't provide you with more info as of this moment, but will do so
as soon as possible.

Olga

Quote:
Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com

Reply With Quote
  #8  
Old   
Olga Segal
 
Posts: n/a

Default Re: Concurrent access to a DB by independent processes - 09-23-2003 , 04:28 PM



bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309190635.272e278e (AT) posting (DOT) google.com>...
Quote:
olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309161227.725ed6a9 (AT) posting (DOT) google.com>...
bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309160454.6c1c511f (AT) posting (DOT) google.com>...
olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309151240.28a759a7 (AT) posting (DOT) google.com>...
If that same program is started in another session, and ANOTHER key is
passed as a parameter (which should prevent both processes from
conflicting, right?), one or both programs crash, but by looking at the
output its clear that before the crash they both updated appropriate
keys and even saw changes made by each other.

You've configured Berkeley DB for locking (DB_INIT_LOCK), so it
doesn't matter if the threads of control use a different key or
not. With locking configured, access to the database will be
serialized.

Anyway, I've just run your program with 7 different processes,
for about 40,000 iterations, without failure.

If you start with a clean database environment directory, and
start N copies of the program, how long does it take for you to
reproduce the failure? Are you ever interrupting the program
and restarting it before the failures occur?

Can you please send me a stack trace from one of the crashes?

In answer to your questions, is there a way I can send you some sample
output along with different scenarios I ran into during testing? I
tried to use
bostic (AT) sleepycat (DOT) com, but the message was said to be rejected. Not
sure if there is a way to attach a file to a message here on google
groups.

Quote:
With what hardware and operating system are you running your
tests?
We use UNIX OS version B.11.00, the machine is HP 9000 N Class.
There is some software that runs on HP XP512 diak array, which is
shared between machines, but I ruled out the possibility that the
XP512 is causing the problem by running my test program on a different
server not attached to the XP. (The program failed)

Quote:
Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com

Reply With Quote
  #9  
Old   
Keith Bostic
 
Posts: n/a

Default Re: Concurrent access to a DB by independent processes - 09-23-2003 , 08:14 PM



olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309231328.505d72aa (AT) posting (DOT) google.com>...
Quote:
bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309190635.272e278e (AT) posting (DOT) google.com>...
olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309161227.725ed6a9 (AT) posting (DOT) google.com>...
bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309160454.6c1c511f (AT) posting (DOT) google.com>...
olga_segal (AT) yahoo (DOT) com (Olga Segal) wrote in message news:<7efcf10b.0309151240.28a759a7 (AT) posting (DOT) google.com>...
In answer to your questions, is there a way I can send you some sample
output along with different scenarios I ran into during testing? I
tried to use bostic (AT) sleepycat (DOT) com, but the message was said to be
rejected.
Sending email to bostic (AT) sleepycat (DOT) com is fine, or sending email
to support (AT) sleepycat (DOT) com would get into Sleepycat's support
system, if that's simpler for you.

Are you using a remote mounted filesystem (for example, NFS) as
the underlying filesystem for your database environment?

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.