![]() | |
#11
| ||||
| ||||
|
|
Ignoramus3367 <ignoramus3367 (AT) NOSPAM (DOT) 3367.invalid> wrote: On 2011-09-08, Peter H. Coffin <hellsop (AT) ninehells (DOT) com> wrote: On Thu, 08 Sep 2011 13:33:51 -0500, Ignoramus3367 wrote: I am building a data warehouse for data mining purposes. It will be a RAID 6 array with 12 2TB drives. Server had 48 GB of memory, 12 intel CPUs, 3Ware 9750 for RAID card. It will have several tables, with one biggest by far. My design principles are 1) Use InnoDB with a separate file per table I'm curious as to your thinking on this one. Not much, I just do not want this database intermixed with any other. That's not an answer, really. InnoDB does not shine too much if your data set is much larger than the buffer pool because InnoDB funnels *all* I/O through that pool. |
|
MyISAM OTOH puts only indexes in memory and is thus better in this situation. However you certainly don't want to use MyISAM for such amounts of data because it would take ages to check or repair a table after a crash. |
|
But then MariaDB was designed exactly for that: it's crashsafe and uses memory in a comparable fashion. |
|
There are also third party storage engines for MySQL that are targeted at data mining: http://www.mysql.com/why-mysql/data-warehouse.html I have no hands-on experience with those though. |
#12
| |||
| |||
|
|
On 2011-09-09, Jerry Stuckle <jstucklex (AT) attglobal (DOT) net> wrote: I do agree with Peter on one thing - you shouldn't be selecting the RDBMS you're using yet. You don't even have a clear idea of WHAT you're going to do, much less HOW you're going to do it. And some of your ideas on operations are questionable - like inserting into a temp table then into the main table. You very well could be hurting performance. Personally, I wouldn't use MySQL for such a large database; it will work, but there are other products better suited to such things. I am EXTREMELY open to suggestions here. My experience with large mysql datasets have not been very encouraging. I am all ears and I would love to hear more. What else? Postgres? |
#13
| |||
| |||
|
|
On Thu, 08 Sep 2011 21:43:00 -0500, Ignoramus3367 wrote: On 2011-09-09, Jerry Stuckle <jstucklex (AT) attglobal (DOT) net> wrote: I do agree with Peter on one thing - you shouldn't be selecting the RDBMS you're using yet. You don't even have a clear idea of WHAT you're going to do, much less HOW you're going to do it. And some of your ideas on operations are questionable - like inserting into a temp table then into the main table. You very well could be hurting performance. Personally, I wouldn't use MySQL for such a large database; it will work, but there are other products better suited to such things. I am EXTREMELY open to suggestions here. My experience with large mysql datasets have not been very encouraging. I am all ears and I would love to hear more. What else? Postgres? What's your budget, realistically? I don't have a good reason to recommend Postgres over MySQL for this; they're both in the "good enough for some purposes and the price sure is nice" category. But the one thing that setting up a data warehouse that is fast, reliable, and easy (-ish) to manage is *not* is inexpensive. Databases that scale well and are (nearly) failure-resistant aren't free. Backup solutions and storage manages aren't free. Middleware software that can help automate and schedule rebuilding of your warehouse data periodically isn't free either. But sometimes it's worth it. This is a very good point: if it starts t get expensive to work around |
#14
| |||
| |||
|
|
I am building a data warehouse for data mining purposes. Could you criticize anything in this approach. |
#15
| |||
| |||
|
|
ignoramus3367 (AT) NOSPAM (DOT) 3367.invalid says... I am building a data warehouse for data mining purposes. Could you criticize anything in this approach. Are you actually building a "real" data warehouse, or simply a reporting silo/datamart copy? By "real" I mean ETL transformation from the source to create a dimensionally-modelled star-schema data representation. Have you read anything by Bill Inmon, Ralph Kimball or Bert Scalzo? |
#16
| |||
| |||
|
|
I am building a data warehouse for data mining purposes. It will be a RAID 6 array with 12 2TB drives. Server had 48 GB of memory, 12 intel CPUs, 3Ware 9750 for RAID card. |
![]() |
| Thread Tools | |
| Display Modes | |
| |