sparse matrix representation -
10-26-2005
, 03:04 PM
Hi all,
Here is data structure problem that some of you might have already
solved or have a good idea on how to solve. How to represent (in
Berkeley DB of course) a very large sparse matrix with the following
characteristics:
1) Number of rows is in the hundreds of thousands.
2) Number of columns is in the range 500-1500.
3) Each cell follows a single-writer/multiple readers patterns where
cells in the same column have the same writer.
4) Retrieval of the non-zero cells of whole rows should be fast.
5) Cell update should be super fast. Updates are quit frequent.
6) Cells are generally updated individually. However, from business
logic perspective it also makes sense to group updates for a single
column, spanning several rows.
7) The sparseness is the following: only a few thousand cells along a
given column are non-zero.
I've been banging my head over this for the past couple of days, so I
thought I'd share it with the group.
Best,
Boris |