![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I use berkeley-db in an application to store a mass of data. These data is coming very frequently, nearly 50 thousands items per second. |
|
So i don't want to use transaction. But, if there were no transaction, the db files woulb be corrupted in disaster, such as power off. Missing data in memory is accepted, but the whole db must be kept for subsequent access. |
#3
| |||
| |||
|
|
Hi, I use berkeley-db in an application to store a mass of data. These data is coming very frequently, nearly 50 thousands items per second. What are these events coming in at 50,000 / second? Is this a sustained or peak rate? So i don't want to use transaction. But, if there were no transaction, the db files woulb be corrupted in disaster, such as power off. Missing data in memory is accepted, but the whole db must be kept for subsequent access. I think you'll want transactions, but with the DB_TXN_WRITE_NOSYNC flag, so that the transaction doesn't have to wait for disk I/O at commit time. With that flag, you are guaranteed to be able to recover databases to some consistent point in time, but some of the most recent updates before a crash may be lost. That said, if you need to achieve a sustained 50,000 updates per second, you will need to think carefully about structuring your data for locality and eliminating contention (are these updates single-threaded?). Any I/O or lock contention would make it difficult or impossible to maintain that sort of throughput. Regards, Michael. |
#4
| |||
| |||
|
|
50,000 / second is a sustained rate And the data input maybe last for months or even years If i usetransaction and log, can i control the log file's size? |
|
What about db_dump? Can i use it to recovery data after disaster without transaction? |
#5
| |||||
| |||||
|
|
50,000 / second is a sustained rate And the data input maybe last for months or even years If i usetransaction and log, can i control the log file's size? Sure, you can control the size of individual log files (with DB_ENV->set_lg_max), and you can control how many Berkeley DB needs to keep by varying the rate of your checkpoints. There is no inherent reason why you can't execute 50,000 transactions / second, but you may also want to consider grouping multiple updates into a single transaction to reduce some overhead (in addition to the DB_TXN_WRITE_NOSYNC flag). Yes, this is a good idea, i should group multiple updates together to |
|
Another issue you will need to consider is the appropriate access method. If you use keys that are allocated sequentially, you should be able to get good cache locality with a btree, recno or queue database. Only queue will perform well if there are concurrent updates to the head or tail, though. I'm going to assume that these are single-threaded inserts. The app collects data from a large number of sources. The sources will |
|
What do your queries look like? Can you partition the data based on when it arrives? That is, can you have a separate table for each hour, for example? Otherwise, another issue you will face is that as the database gets bigger and bigger, you will need to walk down more levels to get to the leaf nodes (assuming we are talking about a btree or recno database). That will also have an impact on performance. Usually, client will ask the app to retrieve one source's historical |
|
What about db_dump? Can i use it to recovery data after disaster without transaction? You may be able to salvage some data this way, but there are no guarantees unless you use transactions. No guarantee? Is that means the data in db files may lost at all? |
|
Michael. |
![]() |
| Thread Tools | |
| Display Modes | |
| |