dbTalk Databases Forums  

DBMS recommendation?

comp.databases comp.databases


Discuss DBMS recommendation? in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
JuergenRiemer
 
Posts: n/a

Default DBMS recommendation? - 01-05-2009 , 08:40 AM






Hi,
We are currently thinking of replacing our existing database system
which is no longer supported.
The data corpus encompasses 4 million records each of which having
about 30 fields. Half a million records would have a full text in PDF
format which we also put as a text field (by a self-made PDF
extraction script).
Our DB usage is moderate, about 1000 searches per day. We load
balance
by means of pound dealing queries to 6 different virtual machines
(each holding a copy of the database).
The web interface is decoupled from the DB; we use a MVC framework
that talks to the DB via an API and retrieves data only.
Currently we use a Windows system which can easily be replaced with
Linux if need be.
Our existing solution has a built-in thesaurus (controlled vocabulary
is static) in addition every term of which holds the number of
records
currently tagged with it.
The new solution should of course be well performing, a thesaurus
functionality would be nice as would be a relevance ranking and
proximity searching – yet not a must.
A cost free solution would be desirable since we want to open our
database to the internet thus we might encounter the need to add new
instances of the DB (virtual machine). Having to pay additional
licences would be too expensive for us.
What would you recommend?
In addition:
1. Our data repository is a large XML file from which we update our
database on a weekly basis by means of a self-made update script.
Would an XML database be an alternative, esp. viewing at the
performance?
2. I was also asked to investigate a poosibiltiy to implement a
federal search on 2 to 3 other sources (different data structure). I
assume this then would be a different beast and not a feature for the
above mentioned new DB I am looking for. Indeed this is not a
requirement yet what options would I have in that concern?
Many thanks for you input,
JR
PS: If this group is not the right place please point me to a proper
one

Reply With Quote
  #2  
Old   
bukzor
 
Posts: n/a

Default Re: DBMS recommendation? - 01-05-2009 , 01:52 PM






On Jan 5, 6:40*am, JuergenRiemer <juergen.rie... (AT) gmail (DOT) com> wrote:
Quote:
Hi,
We are currently thinking of replacing our existing database system
which is no longer supported.
The data corpus encompasses 4 million records each of which having
about 30 fields. Half a million records would have a full text in PDF
format which we also put as a text field (by a self-made PDF
extraction script).
Our DB usage is moderate, about 1000 searches per day. We load
balance
by means of pound dealing queries to 6 different virtual machines
(each holding a copy of the database).
The web interface is decoupled from the DB; we use a MVC framework
that talks to the DB via an API and retrieves data only.
Currently we use a Windows system which can easily be replaced with
Linux if need be.
Our existing solution has a built-in thesaurus (controlled vocabulary
is static) in addition every term of which holds the number of
records
currently tagged with it.
The new solution should of course be well performing, a thesaurus
functionality would be nice as would be a relevance ranking and
proximity searching – yet not a must.
A cost free solution would be desirable since we want to open our
database to the internet thus we might encounter the need to add new
instances of the DB (virtual machine). Having to pay additional
licences would be too expensive for us.
What would you recommend?
In addition:
1. Our data repository is a large XML file from which we update our
database on a weekly basis by means of a self-made update script.
Would an XML database be an alternative, esp. viewing at the
performance?
2. I was also asked to investigate a poosibiltiy to implement a
federal search on 2 to 3 other sources (different data structure). I
assume this then would be a different beast and not a feature for the
above mentioned new DB I am looking for. Indeed this is not a
requirement yet what options would I have in that concern?
Many thanks for you input,
JR
PS: If this group is not the right place please point me to a proper
one
I can't speak for all databases, but I'm quite familiar with MySQL and
it meets all of your needs. It's quite fast and can easily handle
datasets of that size (given proper indexing). It also has full-text
search which you can leverage nicely.

http://dev.mysql.com/doc/refman/5.0/...xt-search.html


Reply With Quote
  #3  
Old   
JuergenRiemer
 
Posts: n/a

Default Re: DBMS recommendation? - 01-06-2009 , 04:37 AM



Quote:
I can't speak for all databases, but I'm quite familiar with MySQL and
it meets all of your needs. It's quite fast and can easily handle
datasets of that size (given proper indexing). It also has full-text
search which you can leverage nicely.

http://dev.mysql.com/doc/refman/5.0/...xt-search.html
thanks for the hint!
A question: is there any support for a "real" thesaurus (broader
terms, narrower terms) - perhaps also to display or use the thesaurus
to browse the data corpus? - in either mySQL or postGre?


Reply With Quote
  #4  
Old   
Robert Klemme
 
Posts: n/a

Default Re: DBMS recommendation? - 01-06-2009 , 05:25 AM



On 06.01.2009 11:37, JuergenRiemer wrote:
Quote:
I can't speak for all databases, but I'm quite familiar with MySQL and
it meets all of your needs. It's quite fast and can easily handle
datasets of that size (given proper indexing). It also has full-text
search which you can leverage nicely.

http://dev.mysql.com/doc/refman/5.0/...xt-search.html

thanks for the hint!
A question: is there any support for a "real" thesaurus (broader
terms, narrower terms) - perhaps also to display or use the thesaurus
to browse the data corpus? - in either mySQL or postGre?
I assume there is no thesaurus functionality in MySQL. But maybe this
page helps as a starting point, there are some thesaurus libs referenced:

http://search.cpan.org/~joseibert/Th...YNONYM_SOURCES

And this looks promising as well:

http://www.sequencepublishing.com/thesage.html

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end


Reply With Quote
  #5  
Old   
JuergenRiemer
 
Posts: n/a

Default Re: DBMS recommendation? - 01-06-2009 , 05:36 AM



Quote:
I assume there is no thesaurus functionality in MySQL. *But maybe this
page helps as a starting point, there are some thesaurus libs referenced:

http://search.cpan.org/~joseibert/Th...b/Thesaurus/DB...

And this looks promising as well:

http://www.sequencepublishing.com/thesage.html

Kind regards

* * * * robert
Thanks Robert,

what I was looking for is a hierarchical thesaurus not a linguistic
one.
I found this open source project for mySQL
http://tematres.r020.com.ar/index.en.html
looks promising as well... in order to show the number of records
tagged with a certain term one would have to modify the code I am
afraid, but still a good starting point.
Does anyone found sg similar for PostGre?


Reply With Quote
  #6  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: DBMS recommendation? - 01-06-2009 , 05:45 AM



JuergenRiemer wrote on 05.01.2009 15:40:
Quote:
Hi,
We are currently thinking of replacing our existing database system
which is no longer supported.
The data corpus encompasses 4 million records each of which having
about 30 fields. Half a million records would have a full text in PDF
format which we also put as a text field (by a self-made PDF
extraction script).
Our DB usage is moderate, about 1000 searches per day.
PostgreSQL is another choice. It's license is a lot more user-friendly.
I still haven't understood under which circumstances I'm allowed to use MySQL
for free. Before the Sun merger they stated "You are free to use MySQL if you
are developing GPL software...".

If you have a system with concurrent reads and writes, PG will perform a lot
better than MySQL and scales a lot better when the number of concurrent sessions
increases.

If your DBMS is (mostly) read-only then both databases are pretty much
comparable in terms of speed. MySQL does very good with simply statements and is
not terribly good with sub-selects and complicated queries

I'd still go for Postgres as it's much more mature and it also seems that the
quality is better. PG would never go "live" with the bugs that were known to be
in MySQL 5.1.

Additionally there are so many little things in MySQL that annoy me. Especially
their strategy to "accept" certain statements but then simply choose to ignore
them (e.g. CHECK constraints, PK, FK constraints on the wrong storage engine,
just no name the ones that can harm your data integrity)

Postgres also concentrates more on data consistency and making sure you never
lose data, whereas MySQL (as far as I can tell) puts performance first and
worries about data issues later.

Postgres has a good full text integration as well.
I'm not aware of any "thesaurus" for either DBMS.

Quote:
We load balance by means of pound dealing queries to 6
different virtual machines each holding a copy of the database).
MySQL seems to be stronger when it comes to replication (at least the available
solutions are more user-friendly). But it sounds as if you already have your own
replicaton system (or did your old DBMS support this out of the box?)

Skype is using PG and they have some good replication and load-balancing
solutions implements. Although they are not very easy to setup and maintain if I
understand that correctly.

With only 4 million records, I'm not sure I would consider replication and
load-balancing at all with a decent hardware and a good raid system with PG.
Especially not with the "1000 searches per day".

When reading the PG mailing list, people start thinking about replication a lot
"later" e.g. when several tables go beyond 100 million rows.

Quote:
2. I was also asked to investigate a poosibiltiy to implement a
federal search on 2 to 3 other sources (different data structure). I
assume this then would be a different beast and not a feature for the
above mentioned new DB I am looking for. Indeed this is not a
requirement yet what options would I have in that concern?
In that case, I would go for an external full-text search engine and not use the
DBMS-built-in.


Some interesting reading about the two of them:

Performance testing in a high-concurrency environment:
<http://tweakers.net/reviews/649/8/database-test-sun-ultrasparc-t1-vs-punt-amd-opteron-pagina-8.html>

A somewhat biased comparison
<http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliabi lity_and_Speed_in_2007>

One of the main PG developers comments on the 5.1 bugs
<http://blog.hagander.net/archives/128-On-the-topic-of-release-quality.html>


Regards
Thomas


Reply With Quote
  #7  
Old   
JuergenRiemer
 
Posts: n/a

Default Re: DBMS recommendation? - 01-06-2009 , 07:38 AM



[..]
Quote:
Regards
Thomas
Thanks Thomas for your great comment!
I will have a look at both.. and still looking for a working thesaurus
implementation in Postgre



Reply With Quote
  #8  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: DBMS recommendation? - 01-06-2009 , 08:10 AM





JuergenRiemer wrote on 06.01.2009 14:38:
Quote:
[..]
Regards
Thomas

Thanks Thomas for your great comment!
and still looking for a working thesaurus implementation in Postgre
You will have more luck posting questions related to Postgres on the PG mailing
list. It is monitored by a lot of "hard-core" users and the core developers.
pgsql-general is probably your best choice.
The lists are also mirrored on a news server (news.gmane.org)

http://www.postgresql.org/community/lists/

Regards
Thomas


Reply With Quote
  #9  
Old   
JuergenRiemer
 
Posts: n/a

Default Re: DBMS recommendation? - 01-06-2009 , 09:09 AM



[..]
Quote:
You will have more luck posting questions related to Postgres on the PG mailing
[..]
http://www.postgresql.org/community/lists/

Regards
Thomas
Thanks Thomas!
I just sent an email


Reply With Quote
  #10  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: DBMS recommendation? - 01-06-2009 , 01:16 PM





JuergenRiemer wrote on 06.01.2009 16:09:
Quote:
You will have more luck posting questions related to Postgres on the PG mailing

Thanks Thomas!
I just sent an email
I have seen it.

Btw: it's either Postgres or PostgreSQL but *not* PostGre



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.