dbTalk Databases Forums  

JOIN, or two separate queries?

comp.databases.mysql comp.databases.mysql


Discuss JOIN, or two separate queries? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jason C
 
Posts: n/a

Default JOIN, or two separate queries? - 12-06-2011 , 12:28 AM






This is for a message board, where I have 2 tables: "subjects", and "posts".

Which is going to be faster and more efficient?

// Two queries:
SELECT SQL_CACHE lastmodified, subject, firstname, lastname, shortdesc
FROM subjects WHERE id=35481;

SELECT SQL_CACHE * FROM posts WHERE id=35481
ORDER BY postdate ASC LIMIT 21 OFFSET 0;

// Search 2 tables in 1 query
SELECT SQL_CACHE
subjects.lastmodified, subjects.subject, subjects.firstname, subjects.lastname, subjects.shortdesc,
posts.postdate, posts.username, posts.comment
FROM subjects, posts
WHERE subjects.id=35481 AND posts.id=35481
ORDER BY postdate ASC LIMIT 21 OFFSET 0


I'm not sure how to test the timing on this. If I do 2 queries, do I add the query times together to find the total, and then compare that to the query time of the one that's 1 query to see which is faster?

Reply With Quote
  #2  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-06-2011 , 04:57 AM






On 12/6/2011 1:28 AM, Jason C wrote:
Quote:
This is for a message board, where I have 2 tables: "subjects", and "posts".

Which is going to be faster and more efficient?

// Two queries:
SELECT SQL_CACHE lastmodified, subject, firstname, lastname, shortdesc
FROM subjects WHERE id=35481;

SELECT SQL_CACHE * FROM posts WHERE id=35481
ORDER BY postdate ASC LIMIT 21 OFFSET 0;

// Search 2 tables in 1 query
SELECT SQL_CACHE
subjects.lastmodified, subjects.subject, subjects.firstname, subjects.lastname, subjects.shortdesc,
posts.postdate, posts.username, posts.comment
FROM subjects, posts
WHERE subjects.id=35481 AND posts.id=35481
ORDER BY postdate ASC LIMIT 21 OFFSET 0


I'm not sure how to test the timing on this. If I do 2 queries, do I add the query times together to find the total, and then compare that to the query time of the one that's 1 query to see which is faster?
Did you try them and find out? There are way too many variables to
tell, i.e. database size, cardinality, defined indexes, buffer sizes,
system load, ... each installation can be different.

In general, it's not a good idea to use SELECT *. It can cause
hard-to-find problems in your programs should the database design
change. Additionally, if you add columns (i.e. a 2Mb BLOB), you will
fetch unnecessary data which you will never use. You should almost
always specify the columns.

You also don't need SQL_CACHE; MySQL will automatically cache results as
possible.

But it looks like you're falling into the trap of premature
optimization. Rather than worry about that now, define your database
and programs well and test. Then if you have a performance problem,
determine the cause of that problem and fix it.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #3  
Old   
Jason C
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-06-2011 , 05:11 AM



On Tuesday, December 6, 2011 5:57:21 AM UTC-5, Jerry Stuckle wrote:
Quote:
Did you try them and find out? There are way too many variables to
tell, i.e. database size, cardinality, defined indexes, buffer sizes,
system load, ... each installation can be different.
That's the thing; I'm not sure HOW to determine which is faster. I know that the speed of either of the first queries is about the same as the speed in searching two tables in one query (ie, 0.007 to 0.008), but does that mean that running two separate queries would be twice as long as the one?


Quote:
In general, it's not a good idea to use SELECT *. It can cause
hard-to-find problems in your programs should the database design
change. Additionally, if you add columns (i.e. a 2Mb BLOB), you will
fetch unnecessary data which you will never use. You should almost
always specify the columns.

You also don't need SQL_CACHE; MySQL will automatically cache results as
possible.
That makes sense, thanks for the note.


Quote:
But it looks like you're falling into the trap of premature
optimization. Rather than worry about that now, define your database
and programs well and test. Then if you have a performance problem,
determine the cause of that problem and fix it.
The program has been running 2 queries (the first example) for about a yearwithout any problems. But right now, I'm going through and rebuilding the backend to be a little easier to moderate, and while doing so, I figure that this is a good time to change the query if doing the second example wouldbe better.

So, I don't think it's really premature optimization, so much as unnecessarily playing around :-)

Besides, I think of it more as a knowledge-gaining experience. I know that I won't physically see a difference that's just microseconds, but working with more and more complex queries helps me to understand the process behindit a little better.

Reply With Quote
  #4  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-06-2011 , 06:14 AM



On 12/6/2011 6:11 AM, Jason C wrote:
Quote:
On Tuesday, December 6, 2011 5:57:21 AM UTC-5, Jerry Stuckle wrote:
Did you try them and find out? There are way too many variables to
tell, i.e. database size, cardinality, defined indexes, buffer sizes,
system load, ... each installation can be different.

That's the thing; I'm not sure HOW to determine which is faster. I know that the speed of either of the first queries is about the same as the speed in searching two tables in one query (ie, 0.007 to 0.008), but does that mean that running two separate queries would be twice as long as the one?


In general, it's not a good idea to use SELECT *. It can cause
hard-to-find problems in your programs should the database design
change. Additionally, if you add columns (i.e. a 2Mb BLOB), you will
fetch unnecessary data which you will never use. You should almost
always specify the columns.

You also don't need SQL_CACHE; MySQL will automatically cache results as
possible.

That makes sense, thanks for the note.


But it looks like you're falling into the trap of premature
optimization. Rather than worry about that now, define your database
and programs well and test. Then if you have a performance problem,
determine the cause of that problem and fix it.

The program has been running 2 queries (the first example) for about a year without any problems. But right now, I'm going through and rebuilding the backend to be a little easier to moderate, and while doing so, I figure that this is a good time to change the query if doing the second example would be better.

So, I don't think it's really premature optimization, so much as unnecessarily playing around :-)

Besides, I think of it more as a knowledge-gaining experience. I know that I won't physically see a difference that's just microseconds, but working with more and more complex queries helps me to understand the process behind it a little better.
If things are working now, don't worry about it. Just because one way
might be a bit better than the other in this instance does not mean any
other query will act the same way.

Good database and application design will go a long ways to providing
good performance. But when you do run into a query that's taking too
long, you should diagnose why that is occurring. There may be lots of
reasons why that happens and finding it isn't always easy.

If you want to learn more about how to troubleshoot long queries, read
up on EXPLAIN and how it works. That will get you going in the right
direction.

And to tell whether one query or two is faster, create two scripts - one
with the single query and one with the two queries. Save the system
time before and after each set of calls, subtract the two and display
the time.

Stop and restart MySQL (to clear the cache) before each run of your
scripts and see how long they take. Run multiple times to get a good
average (other things running in the system will affect timings).

But it's really not important. Relational databases are called that for
a reason. Good programming means using your tools effectively.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #5  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-06-2011 , 03:20 PM



On Tue, 6 Dec 2011 03:11:24 -0800 (PST), Jason C wrote:

Quote:
On Tuesday, December 6, 2011 5:57:21 AM UTC-5, Jerry Stuckle wrote:

Did you try them and find out? There are way too many variables to
tell, i.e. database size, cardinality, defined indexes, buffer sizes,
system load, ... each installation can be different.

That's the thing; I'm not sure HOW to determine which is faster.
I know that the speed of either of the first queries is about the
same as the speed in searching two tables in one query (ie, 0.007 to
0.008), but does that mean that running two separate queries would be
twice as long as the one?
Why does it matter? "Faster" is a matter for academics and scholars.
Technicians only care about "fast enough" and "too slow". If you have
the latter, THEN you worry about how to make it into the former. Usually
by having the academics and scholars teach them in expensive classes.

--
38. If an enemy I have just killed has a younger sibling or offspring
anywhere, I will find them and have them killed immediately,
instead of waiting for them to grow up harboring feelings of
vengeance towards me in my old age. --Anspach's Evil Overlord list

Reply With Quote
  #6  
Old   
John Levine
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-06-2011 , 06:25 PM



Quote:
This is for a message board, where I have 2 tables: "subjects", and "posts".

Which is going to be faster and more efficient?
You're fetching all of the subjects and all of the posts, but not
relating each subject record a specific post record. This isn't really
a join, so you're better off doing two queries.

The second alternative creates the cross-product of the two
selections, so if you have two subjects and five posts, you'll get
back ten rather redundant result records.

R's,
John



Quote:
// Two queries:
SELECT SQL_CACHE lastmodified, subject, firstname, lastname, shortdesc
FROM subjects WHERE id=35481;

SELECT SQL_CACHE * FROM posts WHERE id=35481
ORDER BY postdate ASC LIMIT 21 OFFSET 0;

// Search 2 tables in 1 query
SELECT SQL_CACHE
subjects.lastmodified, subjects.subject, subjects.firstname,
subjects.lastname, subjects.shortdesc,
posts.postdate, posts.username, posts.comment
FROM subjects, posts
WHERE subjects.id=35481 AND posts.id=35481
ORDER BY postdate ASC LIMIT 21 OFFSET 0


I'm not sure how to test the timing on this. If I do 2 queries, do I add
the query times together to find the total, and then compare that to the
query time of the one that's 1 query to see which is faster?

Reply With Quote
  #7  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-06-2011 , 07:01 PM



On 12/6/2011 7:25 PM, John Levine wrote:
Quote:
This is for a message board, where I have 2 tables: "subjects", and "posts".

Which is going to be faster and more efficient?

You're fetching all of the subjects and all of the posts, but not
relating each subject record a specific post record. This isn't really
a join, so you're better off doing two queries.

The second alternative creates the cross-product of the two
selections, so if you have two subjects and five posts, you'll get
back ten rather redundant result records.

R's,
John


No, he's fetching one subject and all the posts related to that one
subject. He just hasn't learned how to join tables.

And no, he needs to learn how to join tables, not make several queries
when one will do.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #8  
Old   
John Levine
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-06-2011 , 09:16 PM



Quote:
This is for a message board, where I have 2 tables: "subjects", and "posts".

No, he's fetching one subject and all the posts related to that one
subject. He just hasn't learned how to join tables.
Depends what the id field is. Perhaps he could tell us.

R's,
John

Reply With Quote
  #9  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-06-2011 , 09:51 PM



On 12/6/2011 10:16 PM, John Levine wrote:
Quote:
This is for a message board, where I have 2 tables: "subjects", and "posts".

No, he's fetching one subject and all the posts related to that one
subject. He just hasn't learned how to join tables.

Depends what the id field is. Perhaps he could tell us.

R's,
John
From his query and questions, I would say it is the thread id.

If it weren't, his message board wouldn't be working now (assuming, of
course, this is the actual SQL he's using, which is pretty likely, IMHO).

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #10  
Old   
Jason C
 
Posts: n/a

Default Re: JOIN, or two separate queries? - 12-09-2011 , 03:49 AM



On Tuesday, December 6, 2011 10:51:12 PM UTC-5, Jerry Stuckle wrote:
Quote:
On 12/6/2011 10:16 PM, John Levine wrote:
This is for a message board, where I have 2 tables: "subjects", and "posts".

No, he's fetching one subject and all the posts related to that one
subject. He just hasn't learned how to join tables.

Depends what the id field is. Perhaps he could tell us.

R's,
John

From his query and questions, I would say it is the thread id.

If it weren't, his message board wouldn't be working now (assuming, of
course, this is the actual SQL he's using, which is pretty likely, IMHO).

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu... (AT) attglobal (DOT) net
==================
You're correct in both counts; the `id` field is the thread ID, chosen by AI in the subjects, then assigned to posts to match. And, you are also correct that my intent with the second option was to get the one subject and allof the posts in one query.

The concern I had was that doing so would create more fields in one array, which may make the processing side a tad slower. Meaning, the first option would have one array with 1 index with 5 fields, then a second array with an infinite number of indexes with 3 fields each; or, the second option would have one array with an infinite number of indexes, with 8 fields each (with all of the fields from the `subject` table being duplicated in each index).

You're also correct that I've never really used JOIN, and never had a real reason to need it. If there's a correct way to use it that would inevitablycreate a faster, more efficient script, then I would greatly appreciate your advice.

As I said before, this is more of a learning experience for me, so even if the increased performance is marginal, it's still worth my time to learn a more efficient method.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.