dbTalk Databases Forums  

improvement suggestions for performance design

comp.databases.postgresql comp.databases.postgresql


Discuss improvement suggestions for performance design in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Tom Forsmo
 
Posts: n/a

Default improvement suggestions for performance design - 07-05-2007 , 07:16 AM






Hi

I have the following scenario for a database that I need to design, and
would like some hints on what to improve or do differently to achieve
the desired performance goal, disregarding hardware and postgres tuning.

The premise is an attribute database that stores about 100 different
attribute types as attribute values. Every X seconds, Y number of new
attribute values are stored in the database. X is constant and currently
between 6 and 20 seconds, depending on the setup. In the future X
could become as low as 3 seconds. Y can, within the next 5-10 years,
become as high as 200 000.

That means that for example, every 6 seconds 100 000 attributes needs to
be written to the database.

At the same time, somewhere between 5-20 users needs to read parts of
those newly written attributes, maybe in total 30 000 attributes.

This continues for the duration of the field operation, which could be
18hrs a day for 6 weeks. So the total db size is up towards 200 gigs.

Now here is how I suggest doing this:

1- the tables

table attribute_values:
id int
attr_type int ( references attribute_types(id) )
posX int
posY int
data_type int
value varchar(50)

table attribute_types:
id int
name varchar(200);



2- function

a function that receives an array of data and inserts each attribute.
perhaps one array per attribute data (type, posX, posY, data_type,
value) so five arrays as in parameters ot the function

3- java client

the client receives the data from a corba request, and splits it
into, say 4 equally sized blocks and executes 4 threads that insert
each block (this seems to be more efficient than just using one
thread.)

Now I am wondering if this is the most efficient way of doing it?

- I know that I could group the attributes so that each type of
attribute gets its own table with all attributes in one row. But I am
not sure if that is any more efficient than ont attribute per row since
I pass everything to the function as an array.
With the above design a change in attribute types only requires changing
the data in a table instead of having to modify the client, the function
and the tables.

- I am also wondering if writing the client and function in C would
create a more efficient solution.

any comments?

ps, I am currently running postgres 8.1, but could probably use 8.2 if
it is needed for functionality or performance reasons. It will run on a
sparc machine with solaris 10 and perhaps 4-6 processors, as many GB of
RAM as necessary and SCSI disks ( perhaps in raid 0 ).

regards

tom


Reply With Quote
  #2  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: improvement suggestions for performance design - 07-24-2007 , 09:02 AM






Tom Forsmo <nospam (AT) nospam (DOT) net> wrote:
Quote:
I have the following scenario for a database that I need to design, and
would like some hints on what to improve or do differently to achieve
the desired performance goal, disregarding hardware and postgres tuning.

The premise is an attribute database that stores about 100 different
attribute types as attribute values. Every X seconds, Y number of new
attribute values are stored in the database. X is constant and currently
between 6 and 20 seconds, depending on the setup. In the future X
could become as low as 3 seconds. Y can, within the next 5-10 years,
become as high as 200 000.

That means that for example, every 6 seconds 100 000 attributes needs to
be written to the database.

At the same time, somewhere between 5-20 users needs to read parts of
those newly written attributes, maybe in total 30 000 attributes.

This continues for the duration of the field operation, which could be
18hrs a day for 6 weeks. So the total db size is up towards 200 gigs.
Maybe somebody else can say if inserting 100000 rows every 6 seconds
while concurrently querying as you describe will be possible.

I assume that you WILL need to tune the database and/or the machine for this.

Quote:
Now here is how I suggest doing this:

1- the tables
I can't see much wrong with that.

Quote:
2- function

a function that receives an array of data and inserts each attribute.
perhaps one array per attribute data (type, posX, posY, data_type,
value) so five arrays as in parameters ot the function

3- java client

the client receives the data from a corba request, and splits it
into, say 4 equally sized blocks and executes 4 threads that insert
each block (this seems to be more efficient than just using one
thread.)
The most efficient way to add data to a PostgreSQL database is the
COPY statement. Using INSERT will be MUCH less efficient. You should
use COPY.

The problem is that JDBC does not support the COPY statement. I have seen
patches that add this functionality, but I don't know if they are
available for current versions, and maintaining patched versions is
a pain.

I would try to use the C API on the database client, it supports COPY.

I would try it without multithreading on the client and check what is the
performance limiting factor. If it is on the client or if it related to
CPU on the server (which has multiple CPUs), multithreading might help.

I don't think that using a PostgreSQL function would speed up things.

Yours,
Laurenz Albe


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.