dbTalk Databases Forums  

Need Efficent Query

comp.databases comp.databases


Discuss Need Efficent Query in the comp.databases forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #21  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Need Efficent Query - 11-14-2007 , 12:13 PM






Roy Hann wrote:
Quote:
"Captain Paralytic" <paul_lautman (AT) yahoo (DOT) com> wrote in message
news:1195033185.507438.249070 (AT) 57g2000hsv (DOT) googlegroups.com...
On 13 Nov, 13:34, "Roy Hann" <specia... (AT) processed (DOT) almost.meat> wrote:
Actually there's another aspect that hasn't been mentioned yet and
that is the database design and the choice of indexes. My background
to using databases is ISAM on System 36 and CICS/VSAM. When I design
database tables (structure, indexes, ...), I'm always thinking of how
it will be used and what the data will look like. I know that if I
design it wrong, no optimzer in the world will ever be able to access
the data efficiently. It seems to me that, programmers who have no
appreciation of how to craft an efficient query, are unlikely to have
given sufficient thought to this important aspect.

I think this is starting to strain the definition of the term "programmer".
No doubt in small projects you do have jacks-of-all-trades and many of them
are just as good as you say. But even just moderately large projects will
have way more programmers writing applications to someone else's database
design. Whether the job titles distinguish them or not, the people doing
the database design are not the same as the much larger number of people
doing most of the coding.

Actually, Paul brings up some excellent points. And no, I don't think
he is straining the definition of "programmer". Any database design is
dependent on the data and how it must be accessed (queries). There are
certain rules for normalization, but the higher the normalization level,
the slower the access can be. Therefore there are times when the
normalization rules have to be bent - more often in large databases.
3NF is a pretty good goal to shoot for, but not always possible and
still get good access time.

And even in large projects with dedicated database designers, the
designers always have some programming experience, and work with
programmers to jointly develop the best design for the project.

Quote:
I don't see a great
deal of difference between helping the optimizer by good design of
database and helping it with good design of query.

A good database design benefits all the programmers without them even being
aware of it. They get the benefit of it without having to do anything
special. They also automatically get the benefit of any future server-side
performance enhancements. This is hugely significant. In July I went to
one site and set up rule-based table partitioning. The application speeded
up by a factor of more than 10 without any code changes. In September I
went to another site confidently expecting to pull off the same trick and
found I would have to re-implement the entire application to make it work,
because their code was written so that it couldn't exploit new server-side
features.

It's true that a database design will benefit the programmers. However,
at the same time, the programmers must still be able to write efficient
queries to access that data. Database designers aren't going to do that
for them.

Quote:
And let's not forget platform-independence. I don't personally worship at
application-portability alter, and I don't think many people really do. But
even though it doesn't matter whether an application is
platform-independents, programmers are better off being
platform-independent.

Roy

I know of a lot of companies who strive towards as much platform
independence as possible. And look at all of the platform independent
open source projects out there.

Platform independence is a good thing. Now I'm not saying everything
has to be platform independent. However, if you design your databases
and queries properly, much of your code can be completely platform
independent.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================



Reply With Quote
  #22  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Need Efficent Query - 11-14-2007 , 01:32 PM






Roy Hann wrote:
Quote:
"Jerry Stuckle" <jstucklex (AT) attglobal (DOT) net> wrote in message
news:KdadnbLW3LVyn6banZ2dnUVZ_o-mnZ2d (AT) comcast (DOT) com...
Roy Hann wrote:

Having to know a special, usually non-obvious, non-portable formulation
for doing common tasks isn't a defect?
What is "special, usually non-obvious, non-portable"? In MySQL, you can
do it all in ANSI standard SQL. Portable to every database which supports
ANSI SQL.

Sure the syntax is portable. No one has ever argued that it isn't. But you
just got through telling me that what is efficient SQL to one DBMS may not
be efficient to another (and that is true). The whole conversation has
been about efficiency from the start (look at the Subject heading), not
syntax. I have nothing to say about syntax.

True, and for maximum efficiency, you may need to tune a few queries if
you switch databases. But in my experience, that is seldom necessary,
and if it is, it is a very minor number of queries which need to be
adjusted.

Quote:
But if you think a RDBMS should be able to optimize everything,
Are you suggesting that isn't a desirable aspiration? Why should I want
an RDBMS that doesn't if I could have one that does?
No, the impossible is not a desirable aspiration. Reality should be,
though.

You are the one who wanted me to write my own DBMS. I just wanted to be
able to rank real products so I could choose the best one. Who's being more
realistic?

And you're expecting perfection from any database. That's impossible,
so it is not nearly as realistic as you writing a RDBMS - which,
although difficult, would not be impossible.

Quote:
then why don't you write your own?
For the very simple reason that I wouldn't know how. But what I do know
is how to compare existing products and rank them by how close to my
ideal they get, so that I can choose the best of the many real products I
can actually get/buy today.
Then you shouldn't be complaining that programmers who DO know it well
aren't getting it right. Perhaps it's a lot harder than you think - i.e.
impossible - for them to get it right.

First of all there is never a burden on any consumer to know how to design a
better product before they ask for it. Do you know how to design a car?
That didn't stop you wanting a better car than your first one though, did
it? Secondly, I am not asking for the impossible. I said products already
exist that do what I was talking about. I was not asking anyone to build a
never-before-seen feature.

And which products are those? I know of none which do all you ask. If
they exist, then why are we even discussing it here? Just use those
products.

Quote:
SQL was the name given to the language component of System R, but it was
derived from other, earlier projects at IBM and elsewhere, including a
language called SQUARE. But I will concede this point. Where it came
from is irrelevant to my main point that in the 21st century optimizers
can reasonably be expected to be highly intelligent.
Actually, not. Sure they got some ideas from SQUARE - but SEQUEL was a
completely separate implementation. But if you want to say that by
getting ideas from SQUARE SEQUEL was derived, then I could point out
SQUARE got ideas from COBOL (which got ideas from Assembler...).

As I said, I concede the point, although I struggle to see what COBOL and
Assembler have to do with it.

Just that SQUARE used ideas gained from COBOL, and COBOL used ideas
gained from Assembler. So by your logic, SQL is descended from
Assembler (or, if you want to go back to the very first computers,
machine code).

Quote:
Sure. Experts will always be needed, but one should hope they'll be
needed with ever-decreasing frequency and only for ever more
complicated queries. The OP's query was, IMO, capable of a trivial
optimization and there are products that you can use today that would
have found it automatically.
And you don't have to be an expert to learn how to write better SQL
statements. Just a programmer.
What an absurd assertion. What is the difference between a programmer
who writes one correct SQL statement (like the OP) and a programmer who
knows how to write a better one? Isn't that the very meaning of
"expert"?
Not at all. Expert programmers can write much more complicated SQL
statements.

This conversation is about the ability of optimizers to take a correct
specification of a required result and generate an optimal execution plan.
No one is talking about the limits of the complexity of the code programmers
can write. But you do raise an important distinction. There are two kinds
of expertise involved. One kind of expertise is having sufficient
understanding to express a desired result, and the other is knowing which of
the very many equivalent ways of expressing it will work efficiently. I say
the first skill will always be valuable, but the value of the second is
inversely proportional to how good the optimzer is. And one hopes
optimizers are getting better all the time, so one also necessarily also
hopes the second kind of expertise becomes less valuable.

You do hope optimizers get better don't you?

Roy

Sure, I hope optimizers get better. But it will be many decades (if not
longer) before they can get anywhere near what you're looking for.

In the meantime, good programmers using RDBMS's need to know how to
write optimum SQL statements and help design efficient database structure.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================



Reply With Quote
  #23  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Need Efficent Query - 11-14-2007 , 04:31 PM



On Nov 13, 8:34 am, "Roy Hann" <specia... (AT) processed (DOT) almost.meat>
wrote:
Quote:
"Captain Paralytic" <paul_laut... (AT) yahoo (DOT) com> wrote in message

news:1194956220.063305.125300 (AT) d55g2000hsg (DOT) googlegroups.com...

On 13 Nov, 12:07, "Roy Hann" <specia... (AT) processed (DOT) almost.meat> wrote:
I also expect that most working programmers have
little or no interest in SQL, regard it as a distraction,
99.7% of all statistics are made up. I suspect this comment from the
"Captain" is too.
Quote:
That's why there are SQL programmers!

I take it you mean *expert* SQL programmers?

Sure, there are such people. But I suspect there are very many more
and Roy joins in with his unsupported claim. Two people said it so it
must be true!

Quote:
programmers writing Java and Cobol and PHP and so on, who are compelled to
use SQL but regard it as something other than a core skill, and know just
enough to get by. They don't want to take the time to become highly
proficient with SQL. They dislike it; they probably feel it gets in the way
more than it helps, and if they have to repeatedly reformulate a query
looking for that one special way of expressing it so it works tolerably
well, then they would (rightly) think that's just voo-doo and hate SQL even
more.
And it is sad that they feel that way. They possibly think programming
in only one language, like java or COBOL, is "A good thing". And many
poeple around the world think Americans are self-centered. How much
moreso are these kinds of programmers.

Quote:
SQL was intended to be a goal-oriented language. The idea was (explicitly)
to allow the optimizer to discern the essential meaning of the query--no
matter how it was expressed--and come up with an efficient way of producing
the requested result. That's how it is supposed to work, and lots of
products achieve that to a remarkable degree.
Wrong.
Optimization is a late addition to SQL. It was originally intended to
be a non-programmer's programming language. The idea was Managers
should be able to write a query to get a report with what they need
without bothering a programmer. needless-to-say a good number of
managers do not understand any select statement beyond a two table
join. So writing queries fell back into the laps of programmers.

(The fact that SQL had a nice translation to relational logic which
was being developed about the same time only helped it along.)
Quote:
Flipping it around, if we had a perfect SQL engine, no one would ever ask
to add a feature so they could write an equivalent query that would be
slower! :-)
We are still waiting for the DWIMNWIS computer. I think it will be a
long wait.

(DWIMNWIS = Do What I Mean, Not What I Say)
Quote:
Expert SQL programmers are actually a bad thing. We shouldn't want to want
them.

Roy
To the extent that we have people that are Expert SQL programmers and
unable to program in anything else, I would agree that this is a bad
thing. Otherwise, I think your optimizer arguement would apply to
about any programming language.

Ed



Reply With Quote
  #24  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Need Efficent Query - 11-14-2007 , 04:49 PM



On Nov 13, 9:32 am, "Roy Hann" <specia... (AT) processed (DOT) almost.meat>
wrote:
Quote:
"Jerry Stuckle" <jstuck... (AT) attglobal (DOT) net> wrote in message

news:nYKdnbGMULq9MKTanZ2dnUVZ_sqinZ2d (AT) comcast (DOT) com...

Roy Hann wrote:
[]
SQL was intended to be a goal-oriented language. The idea was
(explicitly) to allow the optimizer to discern the essential meaning of
the query--no matter how it was expressed--and come up with an efficient
way of producing the requested result. That's how it is supposed to
work, and lots of products achieve that to a remarkable degree.

No, SQL (Structured Query Language) was created to access data in IBM's
System R relational database. And back in the early 70's, optimizers were
a pipe dream.

First of all that doesn't refute anything I said.
Yes it does. look at your paragraph above (the one with "explicitly"
in it)
Optimization was NOT part of the original goal or purpose of SQL.

Quote:
... Secondly SQL (nee SEQUEL)
predates System R by many years--almost a decade.
Ah good you do know some history, even though you may misremember some
of it.

Quote:
... SQL was added to System R
very late in the day. Thirdly there were quite reasonable RDBMS optimizers
by the late 70s. And fourthly, we are in the 21st century now and what was
dreamt of 30 years ago is eminently possible now--as demonstrated daily by
the better products you can actually buy or download today.
As we increased the power of hardware, we also make heavier demands.
The same query on 100,000 rows on a 1MHz CPU might very well finish
faster than on 10,000,000 rows on a 1GHz CPU. Optimizer improvements
can only take us so far.

Yes, better products should be rewarded. Now define better so that
everyone agrees and I'll nominate you for the Nobel Peace prize.


[]
Quote:
Flipping it around, if we had a perfect SQL engine, no one would ever
ask to add a feature so they could write an equivalent query that would
be slower! :-)

Expert SQL programmers are actually a bad thing. We shouldn't want to
want them.

And if we had perfect computer systems, we would never need programmers.
People would be able to just speak into a microphone and get the response
to their questions.

Programmers are actually a bad thing. We shouldn't want to want them.

Maybe you think that's whimsical, but actually that is exactly what I think,
in absolute, complete seriousness.

Roy
I have heard of such a computer. Deep Thought reportedly computed the
answer to Life, the Universe, and Everything. The answer, as you may
have heard, was 42. It only took it 6million years to compute it!
8^)

Now what was the question?
Ed





Reply With Quote
  #25  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Need Efficent Query - 11-14-2007 , 04:52 PM



On Nov 14, 4:39 am, Captain Paralytic <paul_laut... (AT) yahoo (DOT) com> wrote:
Quote:
On 13 Nov, 13:34, "Roy Hann" <specia... (AT) processed (DOT) almost.meat> wrote:





"Captain Paralytic" <paul_laut... (AT) yahoo (DOT) com> wrote in message

news:1194956220.063305.125300 (AT) d55g2000hsg (DOT) googlegroups.com...

On 13 Nov, 12:07, "Roy Hann" <specia... (AT) processed (DOT) almost.meat> wrote:
I also expect that most working programmers have
little or no interest in SQL, regard it as a distraction,

That's why there are SQL programmers!

I take it you mean *expert* SQL programmers?

Sure, there are such people. But I suspect there are very many more
programmers writing Java and Cobol and PHP and so on, who are compelled to
use SQL but regard it as something other than a core skill, and know just
enough to get by. They don't want to take the time to become highly
proficient with SQL. They dislike it; they probably feel it gets in the way
more than it helps, and if they have to repeatedly reformulate a query
looking for that one special way of expressing it so it works tolerably
well, then they would (rightly) think that's just voo-doo and hate SQL even
more.

SQL was intended to be a goal-oriented language. The idea was (explicitly)
to allow the optimizer to discern the essential meaning of the query--no
matter how it was expressed--and come up with an efficient way of producing
the requested result. That's how it is supposed to work, and lots of
products achieve that to a remarkable degree.

Actually there's another aspect that hasn't been mentioned yet and
that is the database design and the choice of indexes. My background
to using databases is ISAM on System 36 and CICS/VSAM. When I design
database tables (structure, indexes, ...), I'm always thinking of how
it will be used and what the data will look like. I know that if I
design it wrong, no optimzer in the world will ever be able to access
the data efficiently. It seems to me that, programmers who have no
appreciation of how to craft an efficient query, are unlikely to have
given sufficient thought to this important aspect. I don't see a great
deal of difference between helping the optimizer by good design of
database and helping it with good design of query.
Excellent point!




Reply With Quote
  #26  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Need Efficent Query - 11-14-2007 , 05:17 PM



On Nov 14, 11:36 am, "Roy Hann" <specia... (AT) processed (DOT) almost.meat>
wrote:
[]
Quote:
This conversation is about the ability of optimizers to take a correct
specification of a required result and generate an optimal execution plan.
Have ou considered anywhere in your evaluations of optimizers exectly
what the problem involves? There are problems that for a given volume
of data cannot be solved in any reasonable amount of time. The best
execution plan for a query involves traversing tables in a way that is
exactly equivalent to the travelling salesman problem. It is
essentially impossible to solve with out using a brute force approach.
So you can dream of optimizers getting better, but they are going
against a fundamental limit. There is a point they cannot do any
better.

Quote:
No one is talking about the limits of the complexity of the code programmers
can write. But you do raise an important distinction. There are two kinds
of expertise involved. One kind of expertise is having sufficient
understanding to express a desired result, and the other is knowing which of
the very many equivalent ways of expressing it will work efficiently. I say
the first skill will always be valuable, but the value of the second is
inversely proportional to how good the optimzer is. And one hopes
optimizers are getting better all the time, so one also necessarily also
hopes the second kind of expertise becomes less valuable.
Keep dreaming.
Quote:
You do hope optimizers get better don't you?

Roy
Yes, but they are stuck against a limit. For any given query, they can
only get incrementally better until they reach that limit and then not
better at all. It's kind of like the Speed of light in physics. Unless
you have some tacheons in your pocket, you can't get there any faster.

Ed



Reply With Quote
  #27  
Old   
Roy Hann
 
Posts: n/a

Default Re: Need Efficent Query - 11-15-2007 , 03:26 AM



"Ed Prochak" <edprochak (AT) gmail (DOT) com> wrote

Quote:
On Nov 14, 11:36 am, "Roy Hann" <specia... (AT) processed (DOT) almost.meat
wrote:
[]
This conversation is about the ability of optimizers to take a correct
specification of a required result and generate an optimal execution
plan.

So you can dream of optimizers getting better, but they are going
against a fundamental limit. There is a point they cannot do any
better.
Maybe you haven't followed this conversation from the start so I will repeat
this for you. I am only saying some *existing* optimizers are better than
others. A reasonable person would prefer the one that is best overall,
meaning the one that most frequently comes up with the best plan for a
getting a required set of data, so that changes to correct SQL to get *that
product's* best plan are needed as infrequently as possible. Note that I am
talking about the "best plan" in the context of that particular DBMS--not
some fantastic plan that particular engine can't even execute.

Quote:
No one is talking about the limits of the complexity of the code
programmers
can write. But you do raise an important distinction. There are two
kinds
of expertise involved. One kind of expertise is having sufficient
understanding to express a desired result, and the other is knowing which
of
the very many equivalent ways of expressing it will work efficiently. I
say
the first skill will always be valuable, but the value of the second is
inversely proportional to how good the optimzer is. And one hopes
optimizers are getting better all the time, so one also necessarily also
hopes the second kind of expertise becomes less valuable.

Keep dreaming.
What am I dreaming about? Optimizers have improved continuously since they
were invented. Are you saying that you think optimization technology has
maxed out and that all implementations have achieved the same ultimate
state-of-the-art and there are now no differences between them?

Quote:
[snip] but they are stuck against a limit. For any given query, they can
only get incrementally better until they reach that limit and then not
better at all. It's kind of like the Speed of light in physics. Unless
you have some tacheons in your pocket, you can't get there any faster.
I am not aware of any reason to suppose we've got much further than a brisk
walking pace yet. But I know nothing about the limits of what is possible
in principle. I am talking about the limits of what has already been
achieved.

Roy




Reply With Quote
  #28  
Old   
Roy Hann
 
Posts: n/a

Default Re: Need Efficent Query - 11-15-2007 , 05:51 AM



"Ed Prochak" <edprochak (AT) gmail (DOT) com> wrote

Quote:
On Nov 13, 8:34 am, "Roy Hann" <specia... (AT) processed (DOT) almost.meat
wrote:
"Captain Paralytic" <paul_laut... (AT) yahoo (DOT) com> wrote in message

news:1194956220.063305.125300 (AT) d55g2000hsg (DOT) googlegroups.com...

On 13 Nov, 12:07, "Roy Hann" <specia... (AT) processed (DOT) almost.meat> wrote:
I also expect that most working programmers have
little or no interest in SQL, regard it as a distraction,

99.7% of all statistics are made up. I suspect this comment from the
"Captain" is too.
Er, actually Captain P didn't write that, I did. And I didn't offer it as a
statistic, I said I "expect" that is the case. That pretty clearly flags it
as just my opinion. If I had any statistics you can be sure I'd assert it
as an undisputed fact, loud and clear.

Quote:
That's why there are SQL programmers!

I take it you mean *expert* SQL programmers?

Sure, there are such people. But I suspect there are very many more

and Roy joins in with his unsupported claim. Two people said it so it
must be true!
Nope still just me, and again I clearly indicated it is just a suspicion, so
I really can't be criticised for not supporting it with evidence. On the
other hand it is not a very far-fetched suspicion. Feel free to cite
evidence to contradict me or we're just as unreliable as each other.

Quote:
They dislike [SQL]; they probably feel it gets in the way
more than it helps, and if they have to repeatedly reformulate a query
looking for that one special way of expressing it so it works tolerably
well, then they would (rightly) think that's just voo-doo and hate SQL
even
more.

And it is sad that they feel that way. They possibly think programming
in only one language, like java or COBOL, is "A good thing". And many
poeple around the world think Americans are self-centered. How much
moreso are these kinds of programmers.
You miss the point. SQL just describes the required result. The obvious
way to express the query can work perfectly well in one product and very
badly in another. Since the programmer has done nothing wrong, and has not
coded a bad algorithm, they basically have to do a Monte Carlo search for
another way of expressing the result they need that just happens to tip
their optimizer into doing the job efficiently. There is no general
principle they could follow, they just have to know, or get lucky. Most
programmers who are new to SQL find that bewildering, frustrating, and
unhelpful. It sure isn't desirable behaviour, is it?

Quote:
SQL was intended to be a goal-oriented language. The idea was
(explicitly)
to allow the optimizer to discern the essential meaning of the query--no
matter how it was expressed--and come up with an efficient way of
producing
the requested result. That's how it is supposed to work, and lots of
products achieve that to a remarkable degree.

Wrong.
About what? I already cited Chamberlin and Boyce's own words. Let me do it
again: "The work on [SQL]...is consistent with the trend to declarative
problem specification."

Quote:
Optimization is a late addition to SQL.
I guess that depends when "late" starts. It goes back at least 20 years in
some products, and maybe more. Long enough that is doesn't matter in the
context of this conversation.

Quote:
It was originally intended to
be a non-programmer's programming language.
Partially correct, yes, but that was literally the second explicit
objective. The first stated objective (and here I quote Chamberlin and
Boyce again) was: "...to lower software costs among professional
programmers."

Quote:
(The fact that SQL had a nice translation to relational logic which
was being developed about the same time only helped it along.)
Actually that is not true. SQL/SEQUEL/SQUARE/etc. were at best influenced
by relational logic. Even at the time there were bitter rows within IBM
over the poor fidelity to relational algebra and to the relational model,
and it has only got worse. But we digress.

Quote:
Flipping it around, if we had a perfect SQL engine, no one would ever
ask
to add a feature so they could write an equivalent query that would be
slower! :-)

We are still waiting for the DWIMNWIS computer. I think it will be a
long wait.

(DWIMNWIS = Do What I Mean, Not What I Say)
That isn't the problem. That is exactly what the problem is not. The
problem is that a reasonable person (such as the OP of this thread) can
write a query that says *exactly* what he means, and it still isn't good
enough. He has to discover some equivalent but different formulation,
somehow, by no means anyone can explain except by cataloging "tricks that
have worked in the past and might continue to work in the future".

Quote:
Expert SQL programmers are actually a bad thing. We shouldn't want to
want
them.

To the extent that we have people that are Expert SQL programmers and
unable to program in anything else, I would agree that this is a bad
thing. Otherwise, I think your optimizer arguement would apply to
about any programming language.
No, SQL is unlike any other comparably widely used language because it is
declarative.

Roy




Reply With Quote
  #29  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: Need Efficent Query - 11-15-2007 , 10:34 AM



On Wed, 14 Nov 2007 01:39:45 -0800, Captain Paralytic wrote:
Quote:
Actually there's another aspect that hasn't been mentioned yet and
that is the database design and the choice of indexes. My background
to using databases is ISAM on System 36 and CICS/VSAM. When I design
database tables (structure, indexes, ...), I'm always thinking of how
it will be used and what the data will look like. I know that if I
design it wrong, no optimzer in the world will ever be able to access
the data efficiently. It seems to me that, programmers who have no
appreciation of how to craft an efficient query, are unlikely to have
given sufficient thought to this important aspect. I don't see a great
deal of difference between helping the optimizer by good design of
database and helping it with good design of query.
There is, however, a difference between REALLY helping the optimizer and
just thinking you're helping, and the latter is often part and parcel
with the whole concept of "premature optimization" on the part of the
programmer.

--
86. I will make sure that my doomsday device is up to code and properly
grounded.
--Peter Anspach's list of things to do as an Evil Overlord


Reply With Quote
  #30  
Old   
David Cressey
 
Posts: n/a

Default Re: Need Efficent Query - 11-16-2007 , 08:45 AM




"ZeldorBlat" <zeldorblat (AT) gmail (DOT) com> wrote

Quote:
On Nov 12, 8:52 pm, Dean.Brot... (AT) gmail (DOT) com wrote:
I'm hoping some SQL gurus read this and help me optimize my table/
query.

The table is shown below:

CREATE TABLE IF NOT EXISTS PACKAGE_TABLE (
ID BIGINT UNSIGNED PRIMARY KEY NOT NULL
AUTO_INCREMENT,
TAG_ID BIGINT UNSIGNED NOT NULL,
LAST_UPDATE TIMESTAMP,
JOB_NAME MEDIUMTEXT,
STATE TEXT,
ERROR_CODES INT UNSIGNED,
OPERATOR MEDIUMTEXT,
TAG_FIRMWARE TINYTEXT,
.
.
.
PLUS 5 OTHER TEXT FIELDS NOT IMPORTANT FOR THIS DICSUSSION
)

In the table I will be storing multiple records keyed by the same
TAG_ID. Essientlly every iteraction of each tag is stored and thus I
can recover a tag's history.

For an important report I want to query for the last update for each
TAG_ID. I have come up with the following:

SELECT * FROM PACKAGE_TABLE t1 WHERE JOB_NAME = 'Some Job' AND
LAST_UPDATE = (SELECT MAX(t2.LAST_UPDATE) FROM PACKAGE_TABLE t2 where
t1.TAG_ID = t2.TAG_ID)

This query works but on large tables (10000+ tag ids) the query really
slows. Does anyone know of a more efficient way to format the table/
query for increased speed.

No guarantees, but try writing it as a join instead:

select *
from package_table p1
join (select p2.tag_id, max(p2.last_update) last_update
from package_table p2
group by p2.tag_id) x
on (p1.tag_id = x.tag_id
and p1.last_update = x.last_update)
where job_name = 'Some Job'

I believe that Roy Hann's responses in this topic are quite correct with
regard to what one may reasonably expect from a good DBMS. Without
detracting one iota from Roy's response, I want to explore what some query
optimizers would do with either the original query or the suggestion you
made. I haven't checked your suggestion for logical correctness, but it
looks good on a casual read. I wonder if it helped the original poster?

Anyway, here's a strategy that might be proposed to perform the original
query:

Do a table scan to pick up all the rows of the table. This involves
accessing 10,000 rows.
Within each iteration of the loop that scans the table, perform the
subquery.
Perform the subquery by doing a table scan of the table to determine the max
value for the tag_id currently seen in the outer loop. This involves
accessing 10,000 rows 10,000 times, or 100,000,000 row accesses.

Even if 99 out of 100 row acesses are performed with no disk i/o, this is
still going to be an intolerably slow strategy.

Next, let's try this strategy:

Do a table scan to pick up all the rows of the table. This involves
accessing 10,000 rows.
Within each iteration of the loop that scans the table, perform the
subquery.

This time, use an index to restrict the search to only the rows with the
corresponding tag_id.
This results in reducing the number of row accesses to 20,000, but adds
10,000 index lookups to the process. It should be on the order of 100 to
1000 times faster than the first strategy. It, of coruse, depends on the
existence of an index on tag_id. A really clever optimizer might choose to
create an index on the fly, if necessary. That's still faster than the
first strategy.

Now, let's try a third strategy:

This strategy depends on a sorted index on tag_id, or generating one on the
fly as discussed before.

Retrieve the rows from the table in order by ascending (or descending, it
doesn't matter) tag_id. Only evaluate the subquery when a row is retrieved
with a new tag _id, one that differs from the tag_id of the previous row
processed. When the subquery is evaluated, use the sorted index to locate
the relevant rows, as discussed above. This reduces the number of index
lookups in the subquery to about 100.


Finally, the last strategy:

Perform a sort-merge-join strategy on the scan of the table in tag_id order,
using a sorted index on tag_id, as discussed above. This involves walking
the entire index twice, and retrieving 20,000 table rows.

Compared to the first strategy, it should be blindingly fast.

Now, I want to return to the question of whether your suggestion might have
been helpful to the original poster. I can easily imagine an optimizer that
would pick the last strategy for the join solution you presented, but pick
one of the other strategies for the query as originally presented in the
topic starter.

So it's possible that your query is both logically correct, and up to 100
times as fast as the original query. If so, I would hope the original
poster would thank you.

It's to be emphasized that the above discussion of strategies is all
generally under the covers, and not exposed to the SQL programmer.
However, some DBMS products allow the SQL programmer to expose the
optimizer's strategy, and even to influence it, via "hints".

Now let me return to the comments that Roy Hann has made in here. It's
entirely reasonable to expect an industrial strength DBMS to contain a very
clever optimizer these days. It's entirely reasonable to expect the
optimizer to rank the four strategies I've outlined above from slower to
faster, in exactly the same order I've given. It's reasonable to expect a
really good optimizer to consider at leat one of the three faster strategies
regardless of how the query is expressed.

So, over all, I have to agree with Roy. A good application (or report)
developer should be able to express the query in whatever form seems
"natural" to the author, and to presumed maintainers, and to expect the
optimizer to take care of the details.

In situations where the optimizer picks a very slow strategy, the kind of
"expert SQL programming" suggested by others may be apropos. Updgrading
to an industrial strength DBMS may be apropos, if this sort of thing slows
down development effort repeatedly, and if the cost of slowing down
development effort is greater than the cost of updgrading to a different
DBMS.

Hope this helps.











Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.