dbTalk Databases Forums  

Re: Dreaming About Redesigning SQL

comp.databases.object comp.databases.object


Discuss Re: Dreaming About Redesigning SQL in the comp.databases.object forum.



Reply
 
Thread Tools Display Modes
  #41  
Old   
Patrick K. O'Brien
 
Posts: n/a

Default Re: Dreaming About Redesigning SQL - 10-10-2003 , 10:45 AM






Lee Fesperman <firstsql (AT) ix (DOT) netcom.com> writes:

Quote:
Jan Hidders wrote:

Lee Fesperman wrote:
Jan Hidders wrote:

Correct. There's absolutely no reason to believe that you cannot
have data independence with logical pointer or references. I
would however argue that allowing entities without representable
keys is not a good idea.

Incorrect. Even though you call them 'logical' pointers, they
are still physical artifacts and have no place in a truly
logical view of the database.

Logical pointers can be defined at the logical level and
implemented in various ways. They are just as much physical
artifacts as, say, relations are.

Databases are about
data, and pointers are not data (or meta-data).

They carry information. That makes them data. I see no good reason
to use a more restrictive definition.

Because the information they carry is physical, which doesn't belong
in a logical model.

Let me try again:

In the relational model, tables (relations) are completely
independent except through the action of inter-table constraints
(like referential integrity). The constraints are declarative and
truly logical. They can be removed without changing the contents of
a single column in the participating tables.

Pointers form a rigid bond between table. To my mind, this makes
them physical --- if it walks like a duck, etc.
But references (as opposed to pointers) are not that far off from
surrogate keys, at least in terms of how they can be used to enforce
referential integrity. But don't take that as trolling, I'm just
setting the stage for a few questions below.

Quote:
Pointers don't have the flexibility associated with a logical
entity. You must change table contents to remove that bond.
I agree with that, and the same is true of references - they do
represent a bond between objects, which makes them inflexible. But if
you compared an RDBMS that used surrogate keys to establish RI between
two tables, you've got a certain inflexibility there as well. If
there is no longer a relationship between the two tables, the FK
should be dropped along with the RI constraint, right?

Quote:
It's been argued here that pointers (references) could be mapped to
pk/fk constructs. Besides throwing away other benefits of pk/fk, it
doesn't work in the general case, for instance:
The only benefit I've come up with is the case where something other
than a surrogate key is used as the fk (like state postal code, for
example), eliminating the need to do a join to get a meaningful
value. What other benefits do you consider to be lost when references
are used instead of values?

Quote:
+ When the fk is part of the pk of the referencing table,
If I understand you correctly, I disagree with this. I understand you
to say that if a fk makes up part of the pk of the referencing table,
and the fk was a reference, instead of a value, the dbms would not be
able to enforce the uniqueness constraint on the pk of the referencing
table. That is not true about my ODBMS implementation (PyPerSyst).
Did I understand you correctly?

Quote:
+ When the fk columns are shared with other fk's.
I don't understand this point. Can you provide a bit more detail
about what you mean, or maybe an example? Thanks.

--
Patrick K. O'Brien
Orbtech http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------


Reply With Quote
  #42  
Old   
Patrick K. O'Brien
 
Posts: n/a

Default Re: Dreaming About Redesigning SQL - 10-10-2003 , 11:22 AM






lauri.pietarinen (AT) atbusiness (DOT) com (Lauri Pietarinen) writes:

Please allow me to take exception to a few of your points.

Quote:
If you take a standard SQL-query with sorting and grouping and some
joins and compare it with hand written "navigational" code you will
notice that what can be expressed in SQL in 5-10 lines of code will
require several pages of hand written code for equivalent result.
That really depends on two things: the specific SQL query, the
"navigational" programming language, and the capabilities of the
ODBMS. In general, I would agree that a declarative language is going
to be able to express an operation more concisely than a procedural or
object-oriented language. But a language like Python is very powerful
and expressive. I can do the equivalent of a simple SQL query in
about as many lines of code in Python using the PyPerSyst ODBMS. It
would take an incredibly complex SQL query before I'd get to "several
pages of hand written code". PyPerSyst itself is only several pages
of hand written code. ;-)

Quote:
The third issue is that the SQL query can be optimised to a much
higher degree than procedural code. Say you add a new index. The
DBMS will/can immediately start using it without any user
intervention (if it decides to). In the procedural alternative you
would have to recode your query.
You wouldn't necessarily *have to*. PyPerSyst does not yet have a
declarative query capability. But if you add an index, the same
procedural code will now make use of that index (or rather, the
procedural code is calling methods of objects that are smart enough to
look for indexes to optimize themselves without requiring any changes
in application code).

I only point this out becase critics of object databases really need
to improve their understanding of the capabilities of object languages
and object databases. They just aren't as inflexible as some pundits
would like to claim.

Quote:
The optimiser can also take into account the cardinality of tables
(=number of rows) so that it will produce a different plan for a
small database and a big database.
PyPerSyst doesn't do this, but it is completely within the realm of
possibilities. All we are really talking about is levels of
abstraction. Nobody should be writing procedural code at such a low
level of abstraction that their code can't be optimized at runtime
based on the presence or absence of indexes, or the size of the data
involved. There is nothing that precludes an ODBMS, or procedural
code, from being able to be optimized at runtime without changing
code.

Quote:
An optimiser can even be sensitive to input from users at run time,
say you have

select name, salary
from nurses
where sex = ?

If the parameter given at run time is 'M' then it could be
advantageous to use an index, but not if parameter is 'F'. This
decision could be made by the optimiser "on the fly". I hope you get
the picture...
I hope you get the picture that every example you have given can be
true about procedural code. Here is a very simple example of the
source code for the "find" method of the PyPerSyst Extent class:

def find(self, **criteria):
"""Return list of instances exactly matching all criteria."""
instances = []
names = criteria.keys()
spec = self._makekeyspec(names)
if spec in self._altkeys:
# Use alternate keys as a shortcut.
key = self._makekey(criteria, spec)
d = self._altkeys[spec]
if key in d:
instance = d[key]
instances.append(instance)
else:
# Scan all instances for a match.
for instance in self._instances.values():
match = True
for name, value in criteria.items():
if getattr(instance, name) != value:
match = False
break
if match:
instances.append(instance)
return instances

This find method returns a list of instances matching some criteria
supplied by the application code. A specific example of its use would
look like this:

people = db.root['Person'].find(name='Lauri Pietarinen')

If there is an alternate key index on the name attribute, the find
method will use that as an internal optimization. If one does not
exist, then it will scan all instance of the Person class looking for
a match. Let's assume that today there is no index, and all instances
are scanned. If tomorrow the schema is changed to declare that there
should be an alternate key enforced on the name attribute, then
PyPerSyst will build and maintain an index that will be used to
enforce that constraint. As a side benefit, the find method will also
use that index to optimize its performance. The application code will
not have changed.

Quote:
Other than that, what's wrong with the network (or similar)
models? I have not found a good link to such a discussion though
I hear it repeated that there are certain classical arguments
against them.

In 1974 there was the BIG DEBATE where Codd defended the RM against
network guys. I wonder if there is a transcript available?

I think one of Codds arguments was that the number of manipulative
operators needed in the RM (i.e. insert, update, delete) was much
lower than in CODASYL (connect, disconnect, etc etc...)
Any arguments that are true about CODASYL are not necessarily true
about current ODBMSs, in spite of the fact that they both could be
described as implementing a network model. I have also tried to find
information about the weaknesses of the network model and have not
found much of value. Some of the obvious weaknesses of network
products, such as the difficulty in changing your schema because
pointers were static and tied to the physical location of information
on disk, are easily avoided in modern object databases. For example,
PyPerSyst uses references, but has no problem evolving schemas and
migrating instances from the old schema to the new schema. IMO,
references in modern OO languages aren't as evil as pointers in old
CODASYL products. But I'm still looking for information to confirm or
dispute that opinion.

Thanks for listening. :-)

--
Patrick K. O'Brien
Orbtech http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------


Reply With Quote
  #43  
Old   
Anthony W. Youngman
 
Posts: n/a

Default Re: Dreaming About Redesigning SQL - 10-10-2003 , 12:57 PM



In article <bm4sa8$bi4iq$1 (AT) ID-152540 (DOT) news.uni-berlin.de>, Costin Cozianu
<c_cozianu (AT) hotmail (DOT) com> writes
Quote:
select --+leading(d) use_nl(e) index(e deptref_ix)
E.Name FROM Emp E, Dept D
WHERE E.Deptref = REF(d) AND D.Name = 'shoe';

did follow the access path that I wanted to. Of course, it still remains to
see the more challenging task solved with the hints removed.

It is optimizer that is always a problem:-)



Well, I'd be least concerned about the optimizer. If you're using Oracle , I'd
lookup the documentation for the cute name

IS_DANGLING

Cool stuff.
I'd just use a database that doesn't NEED an optimiser :-)

Cheers,
Wol
--
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
Witches are curious by definition and inquisitive by nature. She moved in. "Let
me through. I'm a nosey person.", she said, employing both elbows.
Maskerade : (c) 1995 Terry Pratchett


Reply With Quote
  #44  
Old   
Mikito Harakiri
 
Posts: n/a

Default Optimizing Queries in OODBs (was: Dreaming About Redesigning SQL) - 10-10-2003 , 01:08 PM



"Patrick K. O'Brien" <pobrien (AT) orbtech (DOT) com> wrote

Quote:
lauri.pietarinen (AT) atbusiness (DOT) com (Lauri Pietarinen) writes:
You wouldn't necessarily *have to*. PyPerSyst does not yet have a
declarative query capability. But if you add an index, the same
procedural code will now make use of that index (or rather, the
procedural code is calling methods of objects that are smart enough to
look for indexes to optimize themselves without requiring any changes
in application code).

I only point this out becase critics of object databases really need
to improve their understanding of the capabilities of object languages
and object databases. They just aren't as inflexible as some pundits
would like to claim.
I just went through simple stupidity test for Object Relational yesterday.
Yes, Object Relational folks are not totally dumb, their stuff doesn't have
*obvious* flaws. Therefore, yes it seems possible in principle to optimize
queries in Object Extensions to SQL, and maybe in even OQL as well.

The problem is that they raise complexity so significatly that it is
extremely challenging for the optimizer to catch up. Note, that sql
optimization is struggling to solve all problems in "simple" area of "flat"
relations. Or, to put it in other words, how large is optimization team in
your little PyPerSyst?




Reply With Quote
  #45  
Old   
Patrick K. O'Brien
 
Posts: n/a

Default Re: Optimizing Queries in OODBs (was: Dreaming About Redesigning SQL) - 10-10-2003 , 01:36 PM



"Mikito Harakiri" <mikharakiri (AT) iahu (DOT) com> writes:

Quote:
"Patrick K. O'Brien" <pobrien (AT) orbtech (DOT) com> wrote in message
news:m265ixf4fe.fsf (AT) orbtech (DOT) com...
lauri.pietarinen (AT) atbusiness (DOT) com (Lauri Pietarinen) writes:
You wouldn't necessarily *have to*. PyPerSyst does not yet have a
declarative query capability. But if you add an index, the same
procedural code will now make use of that index (or rather, the
procedural code is calling methods of objects that are smart enough to
look for indexes to optimize themselves without requiring any changes
in application code).

I only point this out becase critics of object databases really need
to improve their understanding of the capabilities of object languages
and object databases. They just aren't as inflexible as some pundits
would like to claim.

I just went through simple stupidity test for Object Relational yesterday.
Yes, Object Relational folks are not totally dumb, their stuff doesn't have
*obvious* flaws. Therefore, yes it seems possible in principle to optimize
queries in Object Extensions to SQL, and maybe in even OQL as well.

The problem is that they raise complexity so significatly that it is
extremely challenging for the optimizer to catch up. Note, that sql
optimization is struggling to solve all problems in "simple" area of "flat"
relations. Or, to put it in other words, how large is optimization team in
your little PyPerSyst?
I don't completely understand all of what you have said, or exactly
what you are asking, but I'll try to provide some information that may
answer your question. First, PyPerSyst is not Relational, nor is it
Object Relational, nor does it support OQL. It is a persistence
system for Python objects. It can persist any Python object graph,
but it also comes with a couple of base classes that can be used to
create an object system that has some *qualities* in common with
Relational databases.

As for optimization, that isn't quite as much of an issue, as
PyPerSyst is an in-memory database, which means that all objects
reside in RAM. Transactions are logged to disk before being executed,
and the combination of system snapshot and transaction log is used to
recover from a crash. So performance is good without a lot of
attention to optimization.

There is no optimization team. As of right now, I have written all
the PyPerSyst code. But I have drawn inspiration from other projects
and academic papers, and there are several developers actively
contributing ideas as well. PyPerSyst is free and open source, so you
can examine the source code yourself:

http://sourceforge.net/projects/pypersyst/

Did that answer your question?

--
Patrick K. O'Brien
Orbtech http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------


Reply With Quote
  #46  
Old   
Mikito Harakiri
 
Posts: n/a

Default Re: Optimizing Queries in OODBs (was: Dreaming About Redesigning SQL) - 10-10-2003 , 02:07 PM




"Patrick K. O'Brien" <pobrien (AT) orbtech (DOT) com> wrote

Quote:
"Mikito Harakiri" <mikharakiri (AT) iahu (DOT) com> writes:
I just went through simple stupidity test for Object Relational
yesterday.
Yes, Object Relational folks are not totally dumb, their stuff doesn't
have
*obvious* flaws. Therefore, yes it seems possible in principle to
optimize
queries in Object Extensions to SQL, and maybe in even OQL as well.

The problem is that they raise complexity so significatly that it is
extremely challenging for the optimizer to catch up. Note, that sql
optimization is struggling to solve all problems in "simple" area of
"flat"
relations. Or, to put it in other words, how large is optimization team
in
your little PyPerSyst?

I don't completely understand all of what you have said, or exactly
what you are asking, but I'll try to provide some information that may
answer your question. First, PyPerSyst is not Relational, nor is it
Object Relational, nor does it support OQL. It is a persistence
system for Python objects. It can persist any Python object graph,
but it also comes with a couple of base classes that can be used to
create an object system that has some *qualities* in common with
Relational databases.

As for optimization, that isn't quite as much of an issue, as
PyPerSyst is an in-memory database, which means that all objects
reside in RAM. Transactions are logged to disk before being executed,
and the combination of system snapshot and transaction log is used to
recover from a crash. So performance is good without a lot of
attention to optimization.

There is no optimization team. As of right now, I have written all
the PyPerSyst code. But I have drawn inspiration from other projects
and academic papers, and there are several developers actively
contributing ideas as well. PyPerSyst is free and open source, so you
can examine the source code yourself:
I apologyse for rude tone of my message. Our newsgroup is overflown with
messages like "Could you please answer the following question, and here is
BTW a little system that I developed".

Optimization is a hard problem. It doesn't matter if the system operates in
memory or on hard disk. If it seems simple in your scope, that means that
there are limitations in flexibility, query expressiveness, and/or power.




Reply With Quote
  #47  
Old   
Paul G. Brown
 
Posts: n/a

Default Re: Optimizing Queries in OODBs (was: Dreaming About Redesigning SQL) - 10-10-2003 , 05:05 PM



"Mikito Harakiri" <mikharakiri (AT) iahu (DOT) com> wrote


[ snip ]

Quote:
I just went through simple stupidity test for Object Relational yesterday.
Yes, Object Relational folks are not totally dumb, their stuff doesn't have
*obvious* flaws. Therefore, yes it seems possible in principle to optimize
queries in Object Extensions to SQL, and maybe in even OQL as well.

The problem is that they raise complexity so significatly that it is
extremely challenging for the optimizer to catch up. Note, that sql
optimization is struggling to solve all problems in "simple" area of "flat"
relations. Or, to put it in other words, how large is optimization team in
your little PyPerSyst?
Which bit of object relational drew your ire?

Extensible domains adds some complexity, but it also simplifies some
stuff. Queries over UDTs/UDFs works pretty good in Postgres. Check out the
Sequoia 2000 and Bucky benchmarks.

References and table inheritance make a lot harder.

KR

Pb


Reply With Quote
  #48  
Old   
Jan Hidders
 
Posts: n/a

Default Re: Dreaming About Redesigning SQL - 10-10-2003 , 06:09 PM



Paul G. Brown wrote:
Quote:
Jan Hidders <jan.hidders (AT) pandora (DOT) be.REMOVE.THIS> wrote


Er, well, file names are actually not abstract because they have a
concrete representation that you can read.

We could go round and round on this. I'd observe that there is a whole
class of things which can potentially go into tuple attributes that aren't
'values': for example, you might put a query in there,
A query is a value.

Quote:
or a reference,
A reference, just by itself without some magical DEREF funcion is an
(abstract) value.

Quote:
or a filename,
A filename is a value.

Quote:
or an executable script to generate a value.
A script is a value.

Quote:
The information content of such values are always 'one step removed' from their
representation, and may in fact be ambiguous.
That you can do an extra step to derive some more information doesn't
mean that they didn't already have some information to begin with. The
crucial question is if in this extra step you need some extra
information that is not to be found anywhere in the tables. For the
evaluation of the query you don't. For dereferencing the reference
without the help of a table that associates them with their destination,
you do. For looking up the contents of the file without the help of a
table that associates them with their contents, you do. For executing
the script you don't.

So as long as you don't expect a DEREF function or a function that
magically looks up the content of a file none of you examples violates
the information principle.

-- Jan Hidders



Reply With Quote
  #49  
Old   
Mikito Harakiri
 
Posts: n/a

Default Re: Optimizing Queries in OODBs (was: Dreaming About Redesigning SQL) - 10-10-2003 , 06:13 PM



"Paul G. Brown" <paul_geoffrey_brown (AT) yahoo (DOT) com> wrote

Quote:
Which bit of object relational drew your ire?

Extensible domains adds some complexity, but it also simplifies some
stuff. Queries over UDTs/UDFs works pretty good in Postgres. Check out
the
Sequoia 2000 and Bucky benchmarks.

References and table inheritance make a lot harder.
Yes, I meant References, Nested Collections, and other redundancies, not
User Defined Types (although, the latter are not a total success either).
Recently spatial folks shifted their focus to topologies, so they use a
respectable method, unlike those "Everything is object" dummies.




Reply With Quote
  #50  
Old   
Christopher Browne
 
Posts: n/a

Default Re: Optimizing Queries in OODBs - 10-10-2003 , 07:47 PM



After takin a swig o' Arrakan spice grog, paul_geoffrey_brown (AT) yahoo (DOT) com (Paul G. Brown) belched out...:
Quote:
Extensible domains adds some complexity, but it also simplifies some
stuff. Queries over UDTs/UDFs works pretty good in Postgres. Check out the
Sequoia 2000 and Bucky benchmarks.
One of the bigger challenges with PostgreSQL is that having extensible
aggregates (one of the "more advanced" relational features) makes it
tough to do aggregate-specific optimizations.

Typical to this, for instance, is that aggregates like MAX() and MIN()
can't trivially take advantage of indices. As "aggregates," their
values have to be collected by walking across the elements in the set.
But it is often the case that you could do the following transform:

select max(field) from table where [conditions];

--->

select field from table where [conditions] order by field descending
limit 1;

To "special case" this is a very dangerous idea, because:
a) It's a "hack" for some aggregates;
b) If other aggregates are needed, it may be counterproductive;
c) If you haven't got a good index for the purpose, "plodding
through the aggregate" may not be worse.

The flip side is that by having aggregates be "generic," it doesn't
take hideous hacks if you need to add in sophisticated aggregates
(common for statistical work).

Of course, this is pretty divergent from the subject; PostgreSQL is
certainly NOT an "OODB"; its extensions tend to add to its 'relational
fidelity.'
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/linuxxian.html
Rules of the Evil Overlord #128. "I will not employ robots as agents
of destruction if there is any possible way that they can be
re-programmed or if their battery packs are externally mounted and
easily removable." <http://www.eviloverlord.com/>


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.