dbTalk Databases Forums  

Datatype like "Blob"

comp.databases.postgresql comp.databases.postgresql


Discuss Datatype like "Blob" in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
jpd
 
Posts: n/a

Default Re: Datatype like "Blob" - 10-18-2006 , 12:33 PM






Begin <4pmq3gFjghafU1 (AT) individual (DOT) net>
On 2006-10-18, Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:
Quote:
On 18.10.2006 14:59, jpd wrote:
But you save yourself the hassle of dealing with one additional
component, namely the file system.
Except that in this scenario you need it anyway, and it is in fact just
about the simplest access method of the range available, and optimised
for the task to boot. This isn't always true, of course, but that wasn't
the point. The point was that there are situations where it is true, and
that I prefer to organise systems in such a way to be able to exploit
the simplest mechanism available.


Quote:
If you want to keep that in sync, you're likely looking at some content
management system, and for consistency you'd probably have to make that
self-referential as well. So adding a couple of checks for url columns
referencing datafiles somewhere in the system then isn't that much of a
problem. I'll admit that putting literally everything in a database has
its elegance, but the model itself only holds up to a certain point and
might not be all that practical beyond that. You did mention backups,
and a backup solution also needs to back it all up together.

What point exactly do you mean?
I don't know where the point lies, exactly. I daresay it isn't really
important to locate it very precisely, either. :-)


[snip]
Quote:
From my experience file systems don't do very well with large amounts
of files in a single directory. If you start distributing files across
several directories you'll be better off by just using a database's
built in indexing which is exactly built for that.
There are several file systems that do deal well with many files, eg
FreeBSD's options DIRHASH. Also, one could easily store files in the
filesystem but use the database to locate the files (report an URL
as stored in a table, treating it as being opaque), so for normal
retrievals no traversing of directories is necessairy and multiple
directory levels can be used without added complexity to the fastpath.


--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
This message was originally posted on Usenet in plain text.
Any other representation, additions, or changes do not have my
consent and may be a violation of international copyright law.


Reply With Quote
  #12  
Old   
Robert Klemme
 
Posts: n/a

Default Re: Datatype like "Blob" - 10-19-2006 , 04:40 AM






On 18.10.2006 19:33, jpd wrote:
Quote:
Begin <4pmq3gFjghafU1 (AT) individual (DOT) net
On 2006-10-18, Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:
On 18.10.2006 14:59, jpd wrote:
But you save yourself the hassle of dealing with one additional
component, namely the file system.

Except that in this scenario you need it anyway, and it is in fact just
about the simplest access method of the range available, and optimised
for the task to boot. This isn't always true, of course, but that wasn't
the point. The point was that there are situations where it is true, and
that I prefer to organise systems in such a way to be able to exploit
the simplest mechanism available.
Although I tend to agree my experience tells me that the simplest
approach often does break at some point. Either it is during the first
project because when implementing starts some forgotten aspects
(requirements, limitations...) show up or it is later because throughout
the lifetime additional requirements must be fulfilled. All this tells
me to apply some caution when choosing approaches and not jump on the
first or simplest solution that comes to mind. :-)

Quote:
problem. I'll admit that putting literally everything in a database has
its elegance, but the model itself only holds up to a certain point and
might not be all that practical beyond that. You did mention backups,
and a backup solution also needs to back it all up together.
What point exactly do you mean?

I don't know where the point lies, exactly. I daresay it isn't really
important to locate it very precisely, either. :-)
Um, I was not so much after a concrete figure but rather what scale you
were talking about. Is it number of items / files? Is it volume? Is
it application complexity? Is it something else?

Quote:
[snip]
From my experience file systems don't do very well with large amounts
of files in a single directory. If you start distributing files across
several directories you'll be better off by just using a database's
built in indexing which is exactly built for that.

There are several file systems that do deal well with many files, eg
FreeBSD's options DIRHASH.
Yeah, I have heard that the development of file systems goes into that
direction. I not sure how ubiquitous those are yet.

Quote:
Also, one could easily store files in the
filesystem but use the database to locate the files (report an URL
as stored in a table, treating it as being opaque), so for normal
retrievals no traversing of directories is necessairy and multiple
directory levels can be used without added complexity to the fastpath.
Yes, but you nevertheless need the logic to distribute files across
directories; if you think a bit about this you will likely start
implementing some tree mechanisms including attempting to equally fill
directories etc. That's exactly what DB's can do very well as I tried
to convey in my last posting (probably not clear enough).

Thanks for the good discussion!

Kind regards

robert


Reply With Quote
  #13  
Old   
jpd
 
Posts: n/a

Default Re: Datatype like "Blob" - 10-20-2006 , 01:42 AM



Begin <4povhaFjjd2nU1 (AT) individual (DOT) net>
On 2006-10-19, Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:
Quote:
On 18.10.2006 19:33, jpd wrote:
Begin <4pmq3gFjghafU1 (AT) individual (DOT) net
On 2006-10-18, Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:
On 18.10.2006 14:59, jpd wrote:
Although I tend to agree my experience tells me that the simplest
approach often does break at some point. Either it is during the
first project because when implementing starts some forgotten aspects
(requirements, limitations...) show up or it is later because
throughout the lifetime additional requirements must be fulfilled. All
this tells me to apply some caution when choosing approaches and not
jump on the first or simplest solution that comes to mind. :-)
True enough. The more complex the approach the more opportunity for
breakage. Sometimes one does need complexity to deal with an inherently
complex situation, but that is always a tradeoff against the extra risk
it comes with.


Quote:
I'll admit that putting literally everything in a database has
its elegance, but the model itself only holds up to a certain
point and might not be all that practical beyond that. You did
mention backups, and a backup solution also needs to back it all
up together.
What point exactly do you mean?

I don't know where the point lies, exactly. I daresay it isn't really
important to locate it very precisely, either. :-)

Um, I was not so much after a concrete figure but rather what scale you
were talking about. Is it number of items / files? Is it volume? Is
it application complexity? Is it something else?
Let's start from the absurd extreme: There's the database application
itself, the OS, the webserver application, and the scripting support
thing (coldfusion, php, etc.). Practically, since you need some sort
of OS installation to do a full restore anyway, it makes sense to pack
those together. Then you'll have an unconfigured system, but all you
need to do is add the configuration and the data.

Continuing, it is going to be pretty hard to store all your webpages in
the database. You can go a long way by storing all the content then have
a very simple frame that drives the scripting engine to get the content,
but unless you're writing a webserver module that does that and can be
packed sensibly with the above baseline environment, that is going to
have to be outside the database.

Then there are the configuration files. You could put all of them except
the ones for the database inside the database, then pull them out at
system start, but that, frankly, is a bit silly, just like storing the
httpd binary inside the database would be silly. Looking at PostgreSQL,
there is configuration data both inside and outside the database, and
it's pretty clear that the stuff kept in configuration files would be
impractical, silly, or dangerous inside.

Now, the simplistic approach to choosing where to draw the line is to
have the scripting engine enabled files contain one of those magical
``frameworks'', which presumably contains formatting information and
corporate identiry, and have content reside inside the database.

That approach, while seeming sensible, does have its edge cases where
it storing big chunks of data[3] starts to get silly, when there is no
gain to be had from having the database carry the burden of storing it,
nevermind transporting it. You can't sendfile() on data stored in the
database.

Where databases are made to manage tuples of small bits of data really
well, there is no sense in storing entire movies as the database isn't
able to do anything useful with that data anyway. You'd be much better
off extracting the metadata, then storing that along with a pointer to
the bulk of the data at a location elsewhere[1].

Storing all content in a database has a further drawback, and that is
that you now need a database and a scripting engine in addition to the
webserver, introducing several points of faillure: the database, the
scripting engine, and the links between them and to the webserver.

The problem here is that it isn't actually necessairy or even that it
adds value, just convenience for you, but not for your viewer, and as
such a risk and a waste of computing cycles. One could just as well have
a script trigger a checkout of all reasonably static pages on every
change, and serve them statically. This also allows for running on every
update an automated systems check on the pages that don't need a runtime
database.

With that you can catch problems early and reduce the risk of showing
the hapless website viewer a page full of SQL, php, coldusion errors,
or merely a http 500 response. It does happen that things break now and
then but if they show up on the first page of your site that's much
more damaging than that some extra functionality breaks but one can
still read the bits about who you are, what you do, where you can be
contacted, and so on.


So, in short, databases are tools that like all good tools are really
useful for what they do, but that doesn't mean they're useful for
everything else, too. I think that there are cases where storing blobs
is useful, but that those are few and happen less often than that
they're actually used. The same goes for most website content stored in
databases.


[snip]
Quote:
There are several file systems that do deal well with many files, eg
FreeBSD's options DIRHASH.

Yeah, I have heard that the development of file systems goes into that
direction. I not sure how ubiquitous those are yet.
There are numerous features only found in one database that aren't found
elsewhere, and still they see use. You're right that caution is needed,
but that doesn't have to preclude use of the features, provided the
choice is made deliberately and documented.


Quote:
Also, one could easily store files in the filesystem but use the
database to locate the files (report an URL as stored in a table,
treating it as being opaque), so for normal retrievals no traversing
of directories is necessairy and multiple directory levels can be
used without added complexity to the fastpath.

Yes, but you nevertheless need the logic to distribute files across
directories; if you think a bit about this you will likely start
implementing some tree mechanisms including attempting to equally fill
directories etc. That's exactly what DB's can do very well as I tried
to convey in my last posting (probably not clear enough).
True, altough even the logic can be amazingly simple. Also, since the
store is made in conjunction with a database, one could use the database
its features to achieve spread across the filesystem. The only thing
you don't use the database for is actual storage.


[1] It does introduce an external dependency, but checking that can
easily be integrated with the website-internal[2] consistency checks
for links and such that you'd need anyway.
[2] Not necessairily database-internal.
[3] Data that is otherwise meaningless to the database.

--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
This message was originally posted on Usenet in plain text.
Any other representation, additions, or changes do not have my
consent and may be a violation of international copyright law.


Reply With Quote
  #14  
Old   
Robert Klemme
 
Posts: n/a

Default Re: Datatype like "Blob" - 10-20-2006 , 02:48 AM



On 20.10.2006 08:42, jpd wrote:
Quote:
Begin <4povhaFjjd2nU1 (AT) individual (DOT) net
On 2006-10-19, Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:
On 18.10.2006 19:33, jpd wrote:
Begin <4pmq3gFjghafU1 (AT) individual (DOT) net
On 2006-10-18, Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:
On 18.10.2006 14:59, jpd wrote:
Although I tend to agree my experience tells me that the simplest
approach often does break at some point. Either it is during the
first project because when implementing starts some forgotten aspects
(requirements, limitations...) show up or it is later because
throughout the lifetime additional requirements must be fulfilled. All
this tells me to apply some caution when choosing approaches and not
jump on the first or simplest solution that comes to mind. :-)

True enough. The more complex the approach the more opportunity for
breakage. Sometimes one does need complexity to deal with an inherently
complex situation, but that is always a tradeoff against the extra risk
it comes with.
Right.

Quote:
I'll admit that putting literally everything in a database has
its elegance, but the model itself only holds up to a certain
point and might not be all that practical beyond that. You did
mention backups, and a backup solution also needs to back it all
up together.
What point exactly do you mean?
I don't know where the point lies, exactly. I daresay it isn't really
important to locate it very precisely, either. :-)
Um, I was not so much after a concrete figure but rather what scale you
were talking about. Is it number of items / files? Is it volume? Is
it application complexity? Is it something else?
<snip/>

So basically your scale is application complexity if I read you correctly.

Quote:
Where databases are made to manage tuples of small bits of data really
well, there is no sense in storing entire movies as the database isn't
able to do anything useful with that data anyway. You'd be much better
off extracting the metadata, then storing that along with a pointer to
the bulk of the data at a location elsewhere[1].
The database can do various things with them that are in fact usefule:

- make binary streams available to various clients (you can achive that
with a network mount or periodic distribution as well)

- ensure data consistency via transactions

- optimize storage and accessibility through its internal caching and
distribution mechanisms

Quote:
So, in short, databases are tools that like all good tools are really
useful for what they do, but that doesn't mean they're useful for
everything else, too.
Agreed.

Regards

robert


Reply With Quote
  #15  
Old   
jpd
 
Posts: n/a

Default Re: Datatype like "Blob" - 10-20-2006 , 04:01 AM



Begin <4prdbcFk1vceU1 (AT) individual (DOT) net>
On 2006-10-20, Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:
Quote:
So basically your scale is application complexity if I read you correctly.
Err. Whole system complexity, rather, as it is a bit unclear what
exactly constitutes an ``application'' in context, where I pretty much
spelled out what this whole system constitutes in a previous post.


[snippage]
Quote:
- ensure data consistency via transactions
To the webscripter only, but that is only one step in the chain if we
are talking database->webscripter->webserver->browser. Not useful to the
browser, and not meaningful for the webscripter if we're talking large
chunks of data that it will only pass on anyway.


Quote:
- optimize storage and accessibility through its internal caching and
distribution mechanisms
Except that likely its internal storage mechanism is optimized to deal
with a high volume of small bits, not big large chunks of data. Just as
SQL isn't really suited as a file transport mechanism.


--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
This message was originally posted on Usenet in plain text.
Any other representation, additions, or changes do not have my
consent and may be a violation of international copyright law.


Reply With Quote
  #16  
Old   
Robinson
 
Posts: n/a

Default Re: Datatype like "Blob" - 10-25-2006 , 09:11 AM



Quote:
[snippage]
A very interesting discussion above. I am currently the developer of an
image storage/management system (infrared images); I was looking into using
PostGRE because we are currently using SQL Server (the free one), with it's
annoying database size and CPU utilization limits. We currently store
images as binary blobs in SQL Server. The difference in speed between
fetching from the database and fetching from the filesystem (as our file
explorer window does) is very noticable. After looking into many of the
available alternatives, I'm starting to think that storing binaries like
this in the database isn't so clever. Although we are getting transactional
integrity and an easy backup/restore mechanism (one database file), we can
overcome many limitations by storing a URL/moniker and putting the images on
the file system, including the ability to add movie files too.

I found an interesting blog post on what is actually going on in the average
system when you manipulate image blobs:

http://mysqldump.azundris.com/archiv...-Database.html

I found it insightful. My only two concerns with the file system approach
are (1) ensuring integrity between the file system and database and (2)
backing up and restoring. My opinion on this is somewhat tainted by having
to deal with Visual Source Safe and it's associated "integrity errors" over
the last few years (in the absence of file system transactions). For the
latter, we have to cope with both corporate users who have an administrator
backing up "the server" and single users who want to put a backup onto a set
of CD's. We need to somehow combine package the file system images
alongside the database backup. .


Robin





Reply With Quote
  #17  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Datatype like "Blob" - 10-25-2006 , 11:30 AM



Robinson <toomuchspamhaspassed (AT) myinboxtoomuchtoooften (DOT) com> wrote:
Quote:
We currently store
images as binary blobs in SQL Server. The difference in speed between
fetching from the database and fetching from the filesystem (as our file
explorer window does) is very noticable. After looking into many of the
available alternatives, I'm starting to think that storing binaries like
this in the database isn't so clever. Although we are getting transactional
integrity and an easy backup/restore mechanism (one database file), we can
overcome many limitations by storing a URL/moniker and putting the images on
the file system, including the ability to add movie files too.

I found an interesting blog post on what is actually going on in the average
system when you manipulate image blobs:

http://mysqldump.azundris.com/archiv...-Database.html

I found it insightful. My only two concerns with the file system approach
are (1) ensuring integrity between the file system and database and (2)
backing up and restoring.
There must be some overhead in retrieving binary data from a database,
though I'd say that the article you quote makes it appear worse than it
maybe is. Also, even if it is a bad thing in MySQL, that need not imply
that it is a bad thing in PostgreSQL.

Your two concerns are justified, and they are exactly the price you have
to pay for the performance gain you envision.

My suggestion:
Try it out! Build a little test application against a PostgreSQL database
and see how it performs on your machine. Compare binary objects in the
database against files in the file system.
That way you will get more reliable information based on YOUR setup on
YOUR machine.

Make your decision based on that!

Yours,
Laurenz Albe


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.