dbTalk Databases Forums  

Retrieving 300K+ records

mailing.database.mysql-plusplus mailing.database.mysql-plusplus


Discuss Retrieving 300K+ records in the mailing.database.mysql-plusplus forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Alex Vishnev
 
Posts: n/a

Default Retrieving 300K+ records - 10-25-2005 , 02:16 PM






Hello,

I am not sure if this is OT for this list. If so, please direct me to the
proper place. I am using mysql++ to retrieve rows from a table. The table
has between 300K-400K rows in it. the table is not indexed. I need to
retrieve all the rows and load them into memory. I am using ResUse class to
receive the records from the query. See below

Query q << "select * from table";

mysqlpp::ResUse use = q.use();
while ((row = use.fetch_row()))
{
string col1 = (string)row.lookup_by_name("col1");
string col2 = (string)row.lookup_by_name("col2")
float col3 = row.lookup_by_name("col3");
}

It takes approx 12-14sec to read thru all the records. At the same time, I
have similar application that reads 400K from a flat file and parses the
data prior to loading into memory and it takes 2-3s. BTW, the process of
loading the records into memory is the same for both applications. In both
cases I am using the same computer (processor), same disks, same file
system. In both cases the request to load file does not traverse the network
as both db and files are local to the system. So I am a little confused why
flat file loading is so much faster. I tried to increase per-connection
buffers in my.cnf, but don't seem to matter so much. Here is what I adjusted
so far:

key_buffer = 128M
max_allowed_packet = 1M
table_cache = 256
sort_buffer_size = 64M
read_buffer_size = 64M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size= 16M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 4

has anyone experimented with something like that? can anyone explain why it
takes 3/4 times longer to load data from mysql then it is from a flat file?
Can anything be done to speed it up?

MySQL = 4.1.12-standard
MySQL++ = 1.7.32
OS =RH Linux ES3.0



--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw


Reply With Quote
  #2  
Old   
Earl Miles
 
Posts: n/a

Default Re: Retrieving 300K+ records - 10-25-2005 , 02:27 PM






Alex Vishnev wrote:
Quote:
Hello,

I am not sure if this is OT for this list. If so, please direct me to the
proper place. I am using mysql++ to retrieve rows from a table. The table
has between 300K-400K rows in it. the table is not indexed. I need to
retrieve all the rows and load them into memory. I am using ResUse class to
receive the records from the query. See below

Query q << "select * from table";

mysqlpp::ResUse use = q.use();
while ((row = use.fetch_row()))
{
string col1 = (string)row.lookup_by_name("col1");
string col2 = (string)row.lookup_by_name("col2")
float col3 = row.lookup_by_name("col3");
}

It takes approx 12-14sec to read thru all the records. At the same time, I
have similar application that reads 400K from a flat file and parses the
data prior to loading into memory and it takes 2-3s. BTW, the process of
loading the records into memory is the same for both applications. In both
cases I am using the same computer (processor), same disks, same file
system. In both cases the request to load file does not traverse the network
as both db and files are local to the system. So I am a little confused why
flat file loading is so much faster. I tried to increase per-connection
buffers in my.cnf, but don't seem to matter so much. Here is what I adjusted
so far:

key_buffer = 128M
max_allowed_packet = 1M
table_cache = 256
sort_buffer_size = 64M
read_buffer_size = 64M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size= 16M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 4

has anyone experimented with something like that? can anyone explain why it
takes 3/4 times longer to load data from mysql then it is from a flat file?
Can anything be done to speed it up?

MySQL = 4.1.12-standard
MySQL++ = 1.7.32
OS =RH Linux ES3.0
In my application, for very very large queries I find that I have to bypass
MySQL++ and go directly to the mysql layer. I find the Row object to be a little
slow; I don't notice it in smaller queries (under 1,000 records) but on very
large queries...it becomes noticeable.

--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
  #3  
Old   
Alex Vishnev
 
Posts: n/a

Default RE: Retrieving 300K+ records - 10-25-2005 , 03:03 PM



Earl,

Thanks for the reply. I was afraid of that... I read in the documentation
somewhere that MySQL++ Row object is not a simple wrapper class and does a
lot more. I was hoping that it would do it with less overhead. Before I jump
into mysql C library, can anyone else confirm or deny that overhead of Row
object (or others) can degrade the performance on large queries?

Earl, would it be possible to send me a mysql C example on how to retrieve
large amount of rows (off list, please ;-))

Alex

-----Original Message-----
From: Earl Miles [mailto:merlin (AT) logrus (DOT) com]
Sent: Tuesday, October 25, 2005 3:27 PM
To: plusplus (AT) lists (DOT) mysql.com
Subject: Re: Retrieving 300K+ records

Alex Vishnev wrote:
Quote:
Hello,

I am not sure if this is OT for this list. If so, please direct me to the
proper place. I am using mysql++ to retrieve rows from a table. The table
has between 300K-400K rows in it. the table is not indexed. I need to
retrieve all the rows and load them into memory. I am using ResUse class
to
receive the records from the query. See below

Query q << "select * from table";

mysqlpp::ResUse use = q.use();
while ((row = use.fetch_row()))
{
string col1 = (string)row.lookup_by_name("col1");
string col2 = (string)row.lookup_by_name("col2")
float col3 = row.lookup_by_name("col3");
}

It takes approx 12-14sec to read thru all the records. At the same time, I
have similar application that reads 400K from a flat file and parses the
data prior to loading into memory and it takes 2-3s. BTW, the process of
loading the records into memory is the same for both applications. In both
cases I am using the same computer (processor), same disks, same file
system. In both cases the request to load file does not traverse the
network
as both db and files are local to the system. So I am a little confused
why
flat file loading is so much faster. I tried to increase per-connection
buffers in my.cnf, but don't seem to matter so much. Here is what I
adjusted
so far:

key_buffer = 128M
max_allowed_packet = 1M
table_cache = 256
sort_buffer_size = 64M
read_buffer_size = 64M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size= 16M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 4

has anyone experimented with something like that? can anyone explain why
it
takes 3/4 times longer to load data from mysql then it is from a flat
file?
Can anything be done to speed it up?

MySQL = 4.1.12-standard
MySQL++ = 1.7.32
OS =RH Linux ES3.0
In my application, for very very large queries I find that I have to bypass
MySQL++ and go directly to the mysql layer. I find the Row object to be a
little
slow; I don't notice it in smaller queries (under 1,000 records) but on very

large queries...it becomes noticeable.

--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe:
http://lists.mysql.com/plusplus?unsu...line (DOT) net




--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
  #4  
Old   
Alex Vishnev
 
Posts: n/a

Default RE: Retrieving 300K+ records - 10-25-2005 , 07:11 PM



All,

It looks like Earl is correct. I modified my code based on his suggestion to
avoid using mysql++ fetch_row from ResUse object and used mysql_fetch_row
instead. That cut my record reading time in 1/2. Now it takes approximately
6-7sec to load the data, instead of original 12-14s. Even though the time to
load records is significantly less, it is still not the same as with using
flat file (2-3sec). is there anything else I can change to improve the
performance?

Alex

-----Original Message-----
From: Earl Miles [mailto:merlin (AT) logrus (DOT) com]
Sent: Tuesday, October 25, 2005 3:27 PM
To: plusplus (AT) lists (DOT) mysql.com
Subject: Re: Retrieving 300K+ records

Alex Vishnev wrote:
Quote:
Hello,

I am not sure if this is OT for this list. If so, please direct me to the
proper place. I am using mysql++ to retrieve rows from a table. The table
has between 300K-400K rows in it. the table is not indexed. I need to
retrieve all the rows and load them into memory. I am using ResUse class
to
receive the records from the query. See below

Query q << "select * from table";

mysqlpp::ResUse use = q.use();
while ((row = use.fetch_row()))
{
string col1 = (string)row.lookup_by_name("col1");
string col2 = (string)row.lookup_by_name("col2")
float col3 = row.lookup_by_name("col3");
}

It takes approx 12-14sec to read thru all the records. At the same time, I
have similar application that reads 400K from a flat file and parses the
data prior to loading into memory and it takes 2-3s. BTW, the process of
loading the records into memory is the same for both applications. In both
cases I am using the same computer (processor), same disks, same file
system. In both cases the request to load file does not traverse the
network
as both db and files are local to the system. So I am a little confused
why
flat file loading is so much faster. I tried to increase per-connection
buffers in my.cnf, but don't seem to matter so much. Here is what I
adjusted
so far:

key_buffer = 128M
max_allowed_packet = 1M
table_cache = 256
sort_buffer_size = 64M
read_buffer_size = 64M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size= 16M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 4

has anyone experimented with something like that? can anyone explain why
it
takes 3/4 times longer to load data from mysql then it is from a flat
file?
Can anything be done to speed it up?

MySQL = 4.1.12-standard
MySQL++ = 1.7.32
OS =RH Linux ES3.0
In my application, for very very large queries I find that I have to bypass
MySQL++ and go directly to the mysql layer. I find the Row object to be a
little
slow; I don't notice it in smaller queries (under 1,000 records) but on very

large queries...it becomes noticeable.

--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe:
http://lists.mysql.com/plusplus?unsu...line (DOT) net




--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
  #5  
Old   
Chris Frey
 
Posts: n/a

Default Re: Retrieving 300K+ records - 10-26-2005 , 12:41 PM



On Tue, Oct 25, 2005 at 03:14:52PM -0400, Alex Vishnev wrote:
Quote:
Query q << "select * from table";

mysqlpp::ResUse use = q.use();
while ((row = use.fetch_row()))
{
string col1 = (string)row.lookup_by_name("col1");
string col2 = (string)row.lookup_by_name("col2")
float col3 = row.lookup_by_name("col3");
}
Before you jump completely to C, you might try using the .at() function
to reference fields by index instead of by name. This removes the repeated
name lookup on each cycle of your loop.

I haven't done timings, so I don't know if this is significant.

- Chris


--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
  #6  
Old   
Earl Miles
 
Posts: n/a

Default Re: Retrieving 300K+ records - 10-26-2005 , 12:47 PM



Chris Frey wrote:
Quote:
On Tue, Oct 25, 2005 at 03:14:52PM -0400, Alex Vishnev wrote:

Query q << "select * from table";

mysqlpp::ResUse use = q.use();
while ((row = use.fetch_row()))
{
string col1 = (string)row.lookup_by_name("col1");
string col2 = (string)row.lookup_by_name("col2")
float col3 = row.lookup_by_name("col3");
}


Before you jump completely to C, you might try using the .at() function
to reference fields by index instead of by name. This removes the repeated
name lookup on each cycle of your loop.

I haven't done timings, so I don't know if this is significant.

- Chris
The name lookups aren't what cause the slowdown (for me, at least) -- it's
translating from a mysql row to a MySQL++ Row object that takes all the time.

--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
  #7  
Old   
Chris Frey
 
Posts: n/a

Default Re: Retrieving 300K+ records - 10-26-2005 , 01:12 PM



On Wed, Oct 26, 2005 at 10:46:49AM -0700, Earl Miles wrote:
Quote:
Before you jump completely to C, you might try using the .at() function
to reference fields by index instead of by name. This removes the repeated
name lookup on each cycle of your loop.


The name lookups aren't what cause the slowdown (for me, at least) -- it's
translating from a mysql row to a MySQL++ Row object that takes all the
time.
Interesting. Looking at the code, there are about 3 objects, one of which
contain a vector of strings, involved in moving the data.

I think this goes back to that iterator optimization that I never got
around to. :-) There might have even been a patch floating around if
memory serves.

- Chris


--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
  #8  
Old   
Alex Vishnev
 
Posts: n/a

Default RE: Retrieving 300K+ records - 10-26-2005 , 01:35 PM



Chris,

The problem could very well be in iterator optimization. What I have =
done is
bypassed sql++ integrator and used mysql_fetch_row directly on returned
result set. After I obtain ResUse object reference, I retrieve MYSQL_RES
with mysql_result() member call from ResUse object. That's how I save =
1/2 of
processing time. However, it is still not as good as loading data from a
file. right now, it is double the time it takes to load data from the =
file.
At the same time, if I go into mysql and copy all rows from one table =
into
another, it takes approx 1-2s.=20

mysql> create table abc select * from oldtab;
Query OK, 397913 rows affected (1.12 sec)
Records: 397913 Duplicates: 0 Warnings: 0

So I am thinking if internally mysql takes 1-2 sec when walking thru old
table and creating new table, I should get similar performance. Maybe I =
am a
little na=EFve, but.... I did not expect it to be 2 to 3 times worse =
than I
have seen in db.

Alex

-----Original Message-----
From: Chris Frey [mailto:cdfrey (AT) foursquare (DOT) net]=20
Sent: Wednesday, October 26, 2005 2:11 PM
To: plusplus (AT) lists (DOT) mysql.com
Subject: Re: Retrieving 300K+ records

On Wed, Oct 26, 2005 at 10:46:49AM -0700, Earl Miles wrote:
Quote:
Before you jump completely to C, you might try using the .at() =
function
to reference fields by index instead of by name. This removes the
repeated
name lookup on each cycle of your loop.

=20
The name lookups aren't what cause the slowdown (for me, at least) -- =
it's

Quote:
translating from a mysql row to a MySQL++ Row object that takes all =
the=20
time.
Interesting. Looking at the code, there are about 3 objects, one of =
which
contain a vector of strings, involved in moving the data.

I think this goes back to that iterator optimization that I never got
around to. :-) There might have even been a patch floating around if
memory serves.

- Chris


--=20
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe:
http://lists.mysql.com/plusplus?unsu...line (DOT) net




--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
  #9  
Old   
Earl Miles
 
Posts: n/a

Default Re: Retrieving 300K+ records - 10-26-2005 , 01:42 PM



Alex Vishnev wrote:
Quote:
Chris,

The problem could very well be in iterator optimization. What I have done is
bypassed sql++ integrator and used mysql_fetch_row directly on returned
result set. After I obtain ResUse object reference, I retrieve MYSQL_RES
with mysql_result() member call from ResUse object. That's how I save 1/2 of
processing time. However, it is still not as good as loading data from a
file. right now, it is double the time it takes to load data from the file.
At the same time, if I go into mysql and copy all rows from one table into
another, it takes approx 1-2s.

mysql> create table abc select * from oldtab;
Query OK, 397913 rows affected (1.12 sec)
Records: 397913 Duplicates: 0 Warnings: 0

So I am thinking if internally mysql takes 1-2 sec when walking thru old
table and creating new table, I should get similar performance. Maybe I am a
little naïve, but.... I did not expect it to be 2 to 3 times worse than I
have seen in db.

Alex
Well, the query you're using is probably not walking through the records, but
using an optimization that just copies the entire file.

How long does it take to just do a "select * from oldtab" and let it scroll the
output to the screen?

--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
  #10  
Old   
Chris Frey
 
Posts: n/a

Default Re: Retrieving 300K+ records - 10-26-2005 , 02:43 PM



On Wed, Oct 26, 2005 at 02:34:30PM -0400, Alex Vishnev wrote:
Quote:
Chris,

The problem could very well be in iterator optimization. What I have done is
bypassed sql++ integrator and used mysql_fetch_row directly on returned
result set. After I obtain ResUse object reference, I retrieve MYSQL_RES
with mysql_result() member call from ResUse object. That's how I save 1/2 of
processing time. However, it is still not as good as loading data from a
file. right now, it is double the time it takes to load data from the file.
I'm not surprised about that... loading from a file removes half (or more)
of the processing... that half is mysql. :-)


Quote:
At the same time, if I go into mysql and copy all rows from one table into
another, it takes approx 1-2s.

mysql> create table abc select * from oldtab;
Query OK, 397913 rows affected (1.12 sec)
Records: 397913 Duplicates: 0 Warnings: 0
That's not surprising either, since it is all in one SQL statement. If you
run that same statement through mysql++, no data will be transferred
between server and client, and the server does all the work, so the time
would be the same.


Quote:
So I am thinking if internally mysql takes 1-2 sec when walking thru old
table and creating new table, I should get similar performance. Maybe I am a
little na?ve, but.... I did not expect it to be 2 to 3 times worse than I
have seen in db.
When transferring data from server to client in an SQL system (any SQL
database), there is processing to do:

create the SQL stmt (this can be substantial in itself
in the data-write case, not currently under discussion)
send it
parse it
load the data from the database
format it into row-by-row chunks for socket connection and send
parse that on the client side as well
In mysql++'s case, this involves copying the data at least
2 or 3 times, not counting the C API, as I understand it.
This is probably the source of the bottleneck.

With a file, you only have:

open file
read file
parse file

Basically, you're doing only the server's work, and loading it straight
into memory.

Unfortunately, none of this helps you unless you want to tackle the
mysql++ parse step and submit an optimizing patch. I'd like to do it,
but I have just too much other stuff that must be done.

- Chris


--
MySQL++ Mailing List
For list archives: http://lists.mysql.com/plusplus
To unsubscribe: http://lists.mysql.com/plusplus?unsu...ie.nctu.edu.tw



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.