dbTalk Databases Forums  

Why is "group by" obligatory in SQL?

comp.databases.theory comp.databases.theory


Discuss Why is "group by" obligatory in SQL? in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Hans Mayr
 
Posts: n/a

Default Why is "group by" obligatory in SQL? - 07-23-2009 , 04:46 AM






Hello,

I still don't know if there is a better group to post general
questions about SQL and the background. So I hope nobody minds ;-)

My question is: Why do you have to state the GROUP BY explicitly in
SQL? Why isn't it enough to write "select a_field, sum(b_field) from
c_table;"? What additional value is generated by "group by a_field"?

Thanks and best,

Hans

Reply With Quote
  #2  
Old   
Walter Mitty
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-23-2009 , 07:27 AM






"Hans Mayr" <mayr1972 (AT) gmx (DOT) de> wrote

Quote:
Hello,

I still don't know if there is a better group to post general
questions about SQL and the background. So I hope nobody minds ;-)

My question is: Why do you have to state the GROUP BY explicitly in
SQL? Why isn't it enough to write "select a_field, sum(b_field) from
c_table;"? What additional value is generated by "group by a_field"?

Thanks and best,

Hans
It's a good question. I can't say why the language designers made the
choices that they did. But I see some problems with asking the parser to
infer a GROUP BY when necessary.

Consider the following

select Last_Name, substring (First_Name, 1, 3) from Persons;

Grouping by last name would be a mistake in this case. "substring" is not
an aggregating function. So the parser would have to be able to distinguish
between aggregating and non aggregating functions. How would this work when
user defined functions can be added to the mix?

Another problem is how you interpret a select with no functions at all.
Consider this:

select Last_Name, First_Name from Persons;

Should this be grouped by Last_Name and First_Name? Doing so would yield
the same answer as

select distinct Last_Name, First_Name from Persons;

But as things stand, it yields the same result as

select all Last_Name, FirstName from Persons;

For some reason that eludes me, the SQL designers chose the default keyword
here to be "all" rather than "distinct". Maybe they were thinking of
execution speed, in the absence of any really good optimizer.

If you left the meaning of a query with no functions as is, it would seem to
be inconsistent with the construct you asked for in your OP. That's my
subjective response, anyway.

Reply With Quote
  #3  
Old   
Bob Badour
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-23-2009 , 08:20 AM



Hans Mayr wrote:

Quote:
Hello,

I still don't know if there is a better group to post general
questions about SQL and the background. So I hope nobody minds ;-)

My question is: Why do you have to state the GROUP BY explicitly in
SQL? Why isn't it enough to write "select a_field, sum(b_field) from
c_table;"? What additional value is generated by "group by a_field"?

Thanks and best,

Hans
Because SQL made the mistake of allowing duplicate rows and not
requiring candidate keys, in SQL, one could group by addtional columns:

select a_field, sum(b_field)
from c_table
group by a_field, d_field;

Reply With Quote
  #4  
Old   
Hans Mayr
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-23-2009 , 03:41 PM



Thanks Walter for your answer. In my mind there was an intelligent
parser, which always knows whether to aggregate (if there is an
aggregate function) or not (if not). I don't know if it was difficult
to program such an parser but the way SQL is defined you might need to
write a couple of words more but at least it is clear. At least
regarding the grouping. Thanks again for shedding light on the aspects
you mentioned.

Best,

Hans

Reply With Quote
  #5  
Old   
Roy Hann
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-24-2009 , 03:22 AM



Hans Mayr wrote:

Quote:
Hello,

I still don't know if there is a better group to post general
questions about SQL and the background. So I hope nobody minds ;-)

My question is: Why do you have to state the GROUP BY explicitly in
SQL? Why isn't it enough to write "select a_field, sum(b_field) from
c_table;"? What additional value is generated by "group by a_field"?
Who can know how the original designers arrived at their decision?
There were certainly other languages from the early SQL era (such as
QUEL) that would allow a similar formulation to what you propose,
meaning what you intend. (And indeed QUEL would allow other sensible
things with aggregates that SQL still doesn't easily allow.)

Personally I imagine the logic of SQL aggregation got defined in haste,
was "good enough", and they just moved swiftly on to the next challenge.
A small team, internal IBM rivalries, publication demands, etc. finished
the job.

--
Roy

Reply With Quote
  #6  
Old   
Cimode
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-24-2009 , 04:06 AM



Snipped.
Quote:
Who can know how the original designers arrived at their decision?
Probably because they believed in the myth of natural programming
languages. SQL is no exception.

Dijskra clearly described the fundamental problem below...

The virtue of formal texts is that their manipulations, in order to be
legitimate, need to satisfy only a few simple rules; they are, when
you come to think of it, an amazingly effective tool for ruling out
all sorts of nonsense that, when we use our native tongues, are almost
impossible to avoid.

(http://www.cs.utexas.edu/~EWD/transc...xx/EWD667.html)

Quote:
There were certainly other languages from the early SQL era (such as
QUEL) that would allow a similar formulation to what you propose,
meaning what you intend. *(And indeed QUEL would allow other sensible
things with aggregates that SQL still doesn't easily allow.)

Personally I imagine the logic of SQL aggregation got defined in haste,
was "good enough", and they just moved swiftly on to the next challenge.
A small team, internal IBM rivalries, publication demands, etc. finished
the job.
When looking at the grammar of the initial BS12, it seems obvious that
we slowly shifted from a declarative oriented language to a verbose
driven language.

In BS12 the expresssion of aggregation was by the assignment of
variables seprating the aggregates from the operation such as

T1 = SUMMARY(EMP, GROUP(DEPTNUM), EMPS=COUNT, SALSUM=SUM(SALARY))
T2 = JOIN(T1, DEPT)
T3 = SELECT(T2, SALSUM > BUDGET)

comparing with SQL

SELECT d.Deptnum, Count(*) as Emps,
Sum(e.Salary) as Salsum, Budget
FROM Emp as e
JOIN Dept as d ON e.Deptnum = d.Deptnum
GROUP BY d.Deptnum, Budget
HAVING Sum(e.Salary) > Budget

It is reasonnable to assume that the BS12 approach consisting of
differentiating operations and presentation, while more abrupt, forced
the programmer to understand better the underlying concepts. In other
words, the introduction of *aesthetics* is one of the factors that
killed SQL's chances to ever become a relational operation language.

Reply With Quote
  #7  
Old   
Cimode
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-24-2009 , 04:35 AM



I particularly like the following passage

When all is said and told, the "naturalness" with which we use our
native tongues boils down to the ease with which we can use them for
making statements the nonsense of which is not obvious.

Reply With Quote
  #8  
Old   
paul c
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-24-2009 , 10:00 AM



Cimode wrote:
Quote:
Snipped.
Who can know how the original designers arrived at their decision?
Probably because they believed in the myth of natural programming
languages. SQL is no exception.

Dijskra clearly described the fundamental problem below...

The virtue of formal texts is that their manipulations, in order to be
legitimate, need to satisfy only a few simple rules; they are, when
you come to think of it, an amazingly effective tool for ruling out
all sorts of nonsense that, when we use our native tongues, are almost
impossible to avoid.

(http://www.cs.utexas.edu/~EWD/transc...xx/EWD667.html)

There were certainly other languages from the early SQL era (such as
QUEL) that would allow a similar formulation to what you propose,
meaning what you intend. (And indeed QUEL would allow other sensible
things with aggregates that SQL still doesn't easily allow.)

Personally I imagine the logic of SQL aggregation got defined in haste,
was "good enough", and they just moved swiftly on to the next challenge.
A small team, internal IBM rivalries, publication demands, etc. finished
the job.
When looking at the grammar of the initial BS12, it seems obvious that
we slowly shifted from a declarative oriented language to a verbose
driven language.

In BS12 the expresssion of aggregation was by the assignment of
variables seprating the aggregates from the operation such as

T1 = SUMMARY(EMP, GROUP(DEPTNUM), EMPS=COUNT, SALSUM=SUM(SALARY))
T2 = JOIN(T1, DEPT)
T3 = SELECT(T2, SALSUM > BUDGET)

comparing with SQL

SELECT d.Deptnum, Count(*) as Emps,
Sum(e.Salary) as Salsum, Budget
FROM Emp as e
JOIN Dept as d ON e.Deptnum = d.Deptnum
GROUP BY d.Deptnum, Budget
HAVING Sum(e.Salary) > Budget

It is reasonnable to assume that the BS12 approach consisting of
differentiating operations and presentation, while more abrupt, forced
the programmer to understand better the underlying concepts. In other
words, the introduction of *aesthetics* is one of the factors that
killed SQL's chances to ever become a relational operation language.

Great post, Cimode, Djikstra as profound as ever, this stuff bears
repeating every so often. Too bad so little remains of the history of
db language development from the 1970's, I think there are still lessons
to be learned from the scanty archives. The motivations of those days
must be murky to anybody who didn't grow up with assembler languages, in
fact I'd say they remaiin murky for most of the people who were there.
Many assembler people I knew seemed to have a constant question in the
back of their minds, but one they never verbalized - "what is the
interface?". I think they probably never even conceived it, it was just
an intermittent doubt in their subconcious mind. I don't think I heard
anybody ask that question until the 1980's. The programming majority in
the mainframe world then used Cobol and I'd say the language trend was
mostly trying to avoid background details because those were somehow
associated with the obscurity of machine language techniques (btw, the
term 'exit strategy' was in fairly common use by the system programming
world then, long before Pentagon historians dissected the military
problems of Vietnam.) In the commercial world of that time there was a
lot of emphasis on so-called 'structured programming techniques', which
we know today is just a form of language mysticism, at least insofar as
it disguises the real programmer's interface. Long before I even knew
that SQL assumed duplicate rows, I could never get why 'project' was
called 'select'.


The BS12 example emphasizes to me that the basic programming interface
is the relation and it's not hard to imagine it leading to a functional
language. I have only ever seen one database book that mentioned BS12
and that was a fairly brief section.


Another quote from the McJones site is from Don Chamberlin:

"I had a conversion experience, and I still
remember this. Ted Codd came to visit Yorktown, I think it
might have been at this symposium that Irv alluded to. He
gave a seminar and a lot of us went to listen to him. This
was as I say a revelation for me because Codd had a bunch
of queries that were fairly complicated queries and since I’d
been studying CODASYL, I could imagine how those
queries would have been represented in CODASYL by
programs that were five pages long that would navigate
through this labyrinth of pointers and stuff. Codd would sort
of write them down as one-liners. These would be queries
like, “Find the employees who earn more than their
managers.” [laughter] He just whacked them out and you
could sort of read them, and they weren’t complicated at all,
and I said, “Wow.” This was kind of a conversion
experience for me, that I understood what the relational
thing was about after that."

Reply With Quote
  #9  
Old   
paul c
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-24-2009 , 11:01 AM



Cimode wrote:
,,,
Quote:
BS12 was developped after IBM started having second thoughts about the
massacre of System R massacre and *after* Oracle was on the way to
build an empire. The context explained below

http://www.mcjones.org/System_R/bs12.html

gives very interesting clues about the context of that second
attempt...
...
IBM's "Service Bureau" operations on all continents were basically a
time-sharing service for customers who couldn't afford their own
mainframes or couldn't justify a second mainframe for a single
application. I suspect there were other reasons, eg., political, the
various semi-autonomous IBM country organizations were famous for their
individual 'not-invented-here' thinking. As well, there were likely big
operability questions regarding the use of SQL in the various Service
Bureaux because most of the customer work was submitted in a batch mode
that didn't encourage timely error correction. I remember some of the
time-sharing operations in N.A. from those days, the instrumentation
experts who were responsible for cpu and other resource billing
algorithms had god-like status in their organizations. Once in a while
a salesman would quote a fixed price based on loose specs that turned
out to be ridiculously low - you should have seen the s... fly in the
executive suite when that happened. Just as now, practically no
significant decision was made on technical grounds.

Reply With Quote
  #10  
Old   
Cimode
 
Posts: n/a

Default Re: Why is "group by" obligatory in SQL? - 07-24-2009 , 11:35 AM



Snipped..

Quote:
I suspect that what a 'TRDBMS' is, isn't yet fully known. *Lots of open
questions, here're just a few:

- Codd espoused logical data independence, yet by assuming some
relations couldn't be updated in certain ways, he allowed a kind of
contradiction, or at least a kind of dead-end.
That is unclear. Unfortunately we can't ask Codd for that.

Quote:
- there is normalization theory to do with avoiding redundancy, but
nothing comparable to do with avoiding ambiguity.
Ambiguity is subjective.

Quote:
- where is the constraint theory? *will normalization turn out to be
just a small part of this?
I would not state exactly that way. I know for a fact that solving
the problem of constraint specialization representation simplifies
normalization up to a point where it is not normalization anymore, at
least not in the traditional sense of a cumbersome process.

I have been working on that for a full decade and made fundamental
discoveries that not only applies to ra but also to computing. Just
no time to publish a book about it.

Quote:
- is the ultimate interface a program that it is in effect a relation?
Precisely. My point is that *only* a computing model can bring a
response to such question and that such question is orthogonal to RM.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.