Mitch wrote:
Quote:
I am considering BerkeleyDB as an alternative to a traditional
relational data warehouse solution (i.e. star schema).
Am I being foolhearty? |
Short answer I think is 'no'.
Quote:
Our fact table often receives 10 million rows per day, with storage
capacity being the only real upper limit.
My current thinking is to implement the fact table as multiple
databases (ala partitioned tables in Oracle), with each fact database
covering a distinct time sequence (i.e. month/week). Each attribute
table will be a seperate database. My db environment will contain all
attribute databases, and some (MRU) subset of fact databases.
Each fact record currently consumes ~104 bytes, resulting in > 1Gig per
day. |
I've seen single BDB files that were 10's of Gigs. The theoretical
maximum database size is very large indeed. I don't think you will
see any problems just from the volume of data you're storing.
The need for partitioning I'm not sure about : there are a few
different
reasons to partition. You'd need to say more about your goals
with partitioning I think (safety, performance, operations issues...).
Quote:
I'm looking for opinions, is BerkeleyDB up to such a challenge? How
would this design perform relative to Oracle/MSSQL? |
I think performance would depend on the query execution code you build.
Insert performance and the bare-metal b-tree lookups should perform
at roughly the same level as other database products (they're all using
similar techniques). I'm not sure what concurrency control issues you
may have : if those are challenging then something with MVCC might
perform better (BDB uses page-level locking for concurrency control).
If it were me, the next thing I'd do is build a simple prototype
application
that can generate the type of data and access pattern that you expect
to have
against BDB. Run that thing and see how it performs.