dbTalk Databases Forums  

DBMS suited to querying individual words in BLOBS (and otherrequirements)

comp.databases comp.databases


Discuss DBMS suited to querying individual words in BLOBS (and otherrequirements) in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
florian
 
Posts: n/a

Default DBMS suited to querying individual words in BLOBS (and otherrequirements) - 02-16-2009 , 05:23 AM






Dear, well, database experts?

I am seeking to design a single-user database backend that is well-
suited to a somewhat impure task: I would like to store so-called
translation memories in it and query them for individual words, or, if
reasonably feasible, even patterns, such as regular expressions.
Translation memories are effectively long lists of pairs of sentences,
one in one language, the other one in the other (you may also have
more than two languages, but that's not the point here). Each pair
usually also has an ID. In a table, thus, a record would basically
look sth like this (should look right when viewed with a monospaced
font):


ID English German

1 The quick brown Der flinke braune
fox jumped over Fuchs sprang über
the lazy dog. den trägen Hund.


I am aware that I can retrieve such records using a LIKE statement,
but I understand what I am about to do here is not exactly what
database theory is about since the data are not atomic. Put in another
way, I understand that if you always query a table using LIKE
statements, it is probably poorly designed (or so I seem to have
gatherered). Yet I don't see how I could make these data atomic.

The whole point of this is that searching for such matches in a lot of
files is unacceptably slow, and I hope that even if this might not be
a classic database task, the latter should be much faster at it.

I would thus have two questions: 1, is there a standard design
solution to this problem? An "index", perhaps? 2, which SQL DBMS would
you think is best suited to this task?

To make the choices more limited, I would also have the following
requirements:

a) the DBMS should be open source or, at least, free

b) it should run on Linux and Windows (in a native version), and it
should be possible to use the same database under both OSes (e.g. boot
Linux and store records, then boot Windows and query them).

c) it should be space-, rather than time-efficient, i.e. it should be
good about using as little disk-space as possible (as to the encoding,
iso-8859-1 would be sufficient). (I know, the most space-efficient
solution would be not to use a DBMS, but to search text files, but
that takes SO long I hope it is possible to achieve a good compromise
with a DBMS.)

d) ideally, it would also offer sth like "lazy loading" of tables,
i.e. only load them into memory when they are queried (but that is
perhaps not that crucial)

I have run an attempt with MySQL, and apparently run into problems as
to requirement b). I figure the reason might be that I haven't been
able to get exactly matching versions for the two OSes.

That's probably enough for one question ... I would be very glad if
you let me know your opinion. As might be apparent from what I've
written, I'm new to databases but have read a book on SQL, so I hope
to have some measure of grasp.

Thanks very much in advance!

Florian

Reply With Quote
  #2  
Old   
Walter Mitty
 
Posts: n/a

Default Re: DBMS suited to querying individual words in BLOBS (and other requirements) - 02-16-2009 , 09:45 AM







"florian" <lorian (AT) fsavigny (DOT) de> wrote

Quote:
Dear, well, database experts?

I am seeking to design a single-user database backend that is well-
suited to a somewhat impure task:
Why do you want a DBMS?

I understand why you want to store this data in a persistent fashion. I
also understand why you want to call the container a "database". I'll even
agree that it is a database, in the broadest sense of that word.

However, when most of us self styled experts discuss databases and DBMSes in
here, we are referring to products that are organized around tables,
columns, domains, SQL, the relational data model, and like that.

There may well be some DBMS products that are peculiarly well suited for the
project you have at hand. I personally don't know any of them. And if I
were to take on the project you have taken on, I'd invest quite a bit of
effort at recasting the data requirements in ways that "make sense" within
the framework of the relational data model. I would NOT expect this to fall
within the range of "data modeling 101". And my effort might well fail.

I suggest that look for tools that are peculiarly well suited for what you
have at hand. I would be looking for tools that offer queries based on
regex, and persistent storage in files. I suggest that you postpone the
search for a DBMS suited to your needs until you've done this first.






Reply With Quote
  #3  
Old   
Walter Mitty
 
Posts: n/a

Default Re: DBMS suited to querying individual words in BLOBS (and other requirements) - 02-16-2009 , 09:45 AM




"florian" <lorian (AT) fsavigny (DOT) de> wrote

Quote:
Dear, well, database experts?

I am seeking to design a single-user database backend that is well-
suited to a somewhat impure task:
Why do you want a DBMS?

I understand why you want to store this data in a persistent fashion. I
also understand why you want to call the container a "database". I'll even
agree that it is a database, in the broadest sense of that word.

However, when most of us self styled experts discuss databases and DBMSes in
here, we are referring to products that are organized around tables,
columns, domains, SQL, the relational data model, and like that.

There may well be some DBMS products that are peculiarly well suited for the
project you have at hand. I personally don't know any of them. And if I
were to take on the project you have taken on, I'd invest quite a bit of
effort at recasting the data requirements in ways that "make sense" within
the framework of the relational data model. I would NOT expect this to fall
within the range of "data modeling 101". And my effort might well fail.

I suggest that look for tools that are peculiarly well suited for what you
have at hand. I would be looking for tools that offer queries based on
regex, and persistent storage in files. I suggest that you postpone the
search for a DBMS suited to your needs until you've done this first.







Reply With Quote
  #4  
Old   
florian
 
Posts: n/a

Default Re: DBMS suited to querying individual words in e.g. BLOBs, or - 02-18-2009 , 10:28 AM




Hi Walter,

thanks for your answer; I appreciate your cautious approach. Even if I
initially thought you got me all wrong, your main point is perhaps an
important question, namely, why I want to use exactly a DBMS to handle
the kind of task I want to handle. The main answer is, I cannot yet
think of anything better. Also, I know a commercial application which
does just that, i.e. store TMs in an MS Access database (though I
haven't studied the tables in detail yet, and I am unsure how space-
economic it is).

The "theoretical" point is that to me, the data in question would seem
to fit to a table model, if a very primitive one (cf. the table I
posted). I seem to understand that you would object that the mere
existence of two fields doesn't make data fit the relational data
model, which is probably entirely true. So I would have to rephrase my
question to: whether or not this is the task DBMSes are best at, does
the idea make sense at all? Looking at the PostgreSQL documentation,
which has a chapter on "Full Text Search", I would conclude "why not?"
- I actually assumed that more or less all DBMSes have text search
capabilities (regex search is not a requirement for me!), and expected
some people could tell me which ones are best suited. How wasteful
such a DBMS approach would be is probably be central to this
consideration (I can see that text-searching capabilities might be
intended as an add-on, rather than as a core feature of a DBMS, and
not have such high performance).

You may or course be right in suggesting that there might be other
kinds of applications which are better suited to this task, and that I
should not focus on DBMSes prematurely. The only kind that would come
to my mind, however, would be a full-text indexer (such as, e.g.
Xapian), provided you can teach it not to index whole files, but
passages in files instead (i.e., single sentences). You would then use
that information to retrieve the whole record from the XML file -
maybe that would be fast. But am I missing something; are there other
possibilities?

Thanks very much and best regards,

Florian

Reply With Quote
  #5  
Old   
Ed Prochak
 
Posts: n/a

Default Re: DBMS suited to querying individual words in BLOBS (and otherrequirements) - 02-19-2009 , 09:46 AM



On Feb 16, 6:23*am, florian <lor... (AT) fsavigny (DOT) de> wrote:
Quote:
Dear, well, database experts?

I am seeking to design a single-user database backend that is well-
suited to a somewhat impure task: I would like to store so-called
translation memories in it and query them for individual words, or, if
reasonably feasible, even patterns, such as regular expressions.
Translation memories are effectively long lists of pairs of sentences,
one in one language, the other one in the other (you may also have
more than two languages, but that's not the point here). Each pair
usually also has an ID. In a table, thus, a record would basically
look sth like this (should look right when viewed with a monospaced
font):

ID * English * * * * * * German

1 * *The quick brown * * Der flinke braune
* * *fox jumped over * * Fuchs sprang über
* * *the lazy dog. * * * den trägen Hund.

I am aware that I can retrieve such records using a LIKE statement,
but I understand what I am about to do here is not exactly what
database theory is about since the data are not atomic. Put in another
way, I understand that if you always query a table using LIKE
statements, it is probably poorly designed (or so I seem to have
gatherered). Yet I don't see how I could make these data atomic.

The whole point of this is that searching for such matches in a lot of
files is unacceptably slow, and I hope that even if this might not be
a classic database task, the latter should be much faster at it.

I would thus have two questions: 1, is there a standard design
solution to this problem? An "index", perhaps?
Specialized indices, yes.

Quote:
.... 2, which SQL DBMS would
you think is best suited to this task?

Look into
Oracle® Text Application Developer's Guide
This will do exactly what you want.


Quote:
To make the choices more limited, I would also have the following
requirements:

a) the DBMS should be open source or, at least, free
If you are using this for non-commercial use, i.e., just for
development, then you can use Oracle for free.

Quote:
b) it should run on Linux and Windows (in a native version), and it
should be possible to use the same database under both OSes (e.g. boot
Linux and store records, then boot Windows and query them).
Oracle runs on both OS's, but the second part of your requirement is
yuck. I know of no DBMS that can work that way, except passibly some
flat file databases. You might build a DB engine in PERL to do that.
Quote:
c) it should be space-, rather than time-efficient, i.e. it should be
good about using as little disk-space as possible (as to the encoding,
iso-8859-1 would be sufficient). (I know, the most space-efficient
solution would be not to use a DBMS, but to search text files, but
that takes SO long I hope it is possible to achieve a good compromise
with a DBMS.)
First you say you want to save space so likely you want the text
compressed. But then you think the search "takes SO long I hope it is
possible to achieve a good compromise". You are going to have to
compromise.

Quote:
d) ideally, it would also offer sth like "lazy loading" of tables,
i.e. only load them into memory when they are queried (but that is
perhaps not that crucial)
Caching data is a standard feature of a decent DBMS. Oracle certainly
does it.

Quote:
I have run an attempt with MySQL, and apparently run into problems as
to requirement b). I figure the reason might be that I haven't been
able to get exactly matching versions for the two OSes.
I'd still be very surprised to hear this works even if you got exactly
the same versions of MySQL in both OS's. Is this something MySQL
claims to do?

Quote:
That's probably enough for one question
well enough for one posting. There were certainly more than one
question. 8^)

Quote:
... I would be very glad if
you let me know your opinion. As might be apparent from what I've
written, I'm new to databases but have read a book on SQL, so I hope
to have some measure of grasp.

Thanks very much in advance!

Florian
I hope you don't think reading a SQL book tells you anything about
Databases. SQL is a query language. Knowing a language tells you
little about the related subjects. Reading a book on Visual BASIC
teaches you little about software engineering.

You have a lot of learning to do.
Ed



Reply With Quote
  #6  
Old   
florian
 
Posts: n/a

Default Re: DBMS suited to querying individual words in BLOBS (and other - 02-19-2009 , 03:22 PM




Hi Ed,

thanks a lot for your answer.

Quote:
Specialized indices, yes. Look into Oracle® Text Application Developer's Guide
Thanks very (!) much for the hint; I shall do that.

Quote:
It should be possible to use the same database under both OSes (e.g. boot
Linux and store records, then boot Windows and query them).

the second part of your requirement is yuck. I know of no DBMS that can work that way, except passibly some
flat file databases.
Damn it. Is there some fundamental reason for this? (I mean, some
basic characteristic all DMBSes share that prevents this from
working?)

Quote:
You are going to have to compromise.
Currently, one search over the files takes thirty seconds, and getting
down to perhaps two would make the data usable.

Quote:
I'd still be very surprised to hear this [i.e. the above] works even if you got exactly
the same versions of MySQL in both OS's. Is this something MySQL claims to do?
Not that I know; I just tested. But the fact that you know of NO DBMS
which is capable of doing this really warrants further inquiry ...
well, I already said this. (I can see that DBMSes' databases don't
NEED to be portable - they aren't usually passed around. But it would
be really important for me to know if they CAN'T be.)

Quote:
I hope you don't think reading a[n] SQL book tells you anything about
Databases.
Although I certainly see your point, I think you're exaggerating a
little - I would probably find it very hard to understand anything
about SQL at all without understanding what it does (and for that
matter, SQL is so unattractive as a language that I would certainly
shun it if it weren't for the databases). But anyway, the book I read
was rather a sort of hands-on tutorial which tried to show how
intuitive, and bad, and (later) how good tables are designed and how
this is practically done using SQL, which seemed to make a lot of
sense. (It did not try to teach application-specific details.) Anyway,
I just mentioned my reading for the sake of communication, i.e. to
indicate that I can be expected to know what records, tables, rows,
columns, IDs, VARCHARs, BLOBs, and a lot of other basic stuff are
(roughly what I presumed equals a "basic grasp").

Quote:
You have a lot of learning to do.
Wouldn't the extent of the "lot" depend on what I want to achieve?
(Experience tells me that's probably right anyway, however.)
Incidentally, I just responded a little more in length because both of
you who replied seemed to exhibit something of a, well, high-browed
attitude. This would probably be appropriate if I tried to produce
something for the use of other people that needs to be robust, but I
don't. I would be well served if I could just knock something together
that does what I need, and I imagine this might work even without
learning how to set up other sorts of databases.

Best regards,

Florian


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.