![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi, I have a huge database and I need to frequently do a COUNT() on a column to see how many rows there are in the table. Sometimes I need to count rows based on some condition but mostly it's just counting the total rows in the table. It takes forever to do this and (I'm using postgreSQL) when I try doing a "EXPLAIN" on the query, it says it's going to 'sequentially' scan through each the table. Is there some efficient way of doing a COUNT? Can someone please help me out with this? What do people do when their tables have billions of rows? Thanks, |
#3
| |||
| |||
|
|
"Steve" <nospam@nopes> schrieb im Newsbeitrag news:4032ea88$1 (AT) clarion (DOT) carno.net.au... Hi, I have a huge database and I need to frequently do a COUNT() on a column to see how many rows there are in the table. Sometimes I need to count rows based on some condition but mostly it's just counting the total rows in the table. It takes forever to do this and (I'm using postgreSQL) when I try doing a "EXPLAIN" on the query, it says it's going to 'sequentially' scan through each the table. Is there some efficient way of doing a COUNT? Can someone please help me out with this? What do people do when their tables have billions of rows? Thanks, I'd say you need at least one index expecially for those counts where there is a criterion. Did you create an index that contains all fields used in the criterion? The index might even help for counting all. It might be better then to use "select count(colum_in_index) from your_table". |
#4
| |||
| |||
|
|
I have a huge database and I need to frequently do a COUNT() on a column to see how many rows there are in the table. Sometimes I need to count rows based on some condition but mostly it's just counting the total rows in the table. It takes forever to do this and (I'm using postgreSQL) when I try doing a "EXPLAIN" on the query, it says it's going to 'sequentially' scan through each the table. |

|
Is there some efficient way of doing a COUNT? Can someone please help me out with this? What do people do when their tables have billions of rows? Thanks, |
|
I'd say you need at least one index expecially for those counts where there is a criterion. |

#5
| |||
| |||
|
|
Centuries ago, Nostradamus foresaw when "Robert Klemme" <bob.news (AT) gmx (DOT) net> would write: "Steve" <nospam@nopes> schrieb im Newsbeitrag news:4032ea88$1 (AT) clarion (DOT) carno.net.au... Hi, |
|
Unfortunately, the MVCC system that improves performance for multitudes of concurrent users means that there _isn't_ a ready automatic way of keeping precise statistics on COUNT(*) on the table. Oracle and DB/2 have similar problems, these days. I suppose that someone (perhaps I) will have to see about implementing the trigger-based scheme for having counters for tables. The essential idea: |
#6
| |||
| |||
|
|
Christopher Browne <cbbrowne (AT) acm (DOT) org> wrote Centuries ago, Nostradamus foresaw when "Robert Klemme" <bob.news (AT) gmx (DOT) net> would write: "Steve" <nospam@nopes> schrieb im Newsbeitrag news:4032ea88$1 (AT) clarion (DOT) carno.net.au... Hi, [...] Unfortunately, the MVCC system that improves performance for multitudes of concurrent users means that there _isn't_ a ready automatic way of keeping precise statistics on COUNT(*) on the table. Oracle and DB/2 have similar problems, these days. I suppose that someone (perhaps I) will have to see about implementing the trigger-based scheme for having counters for tables. The essential idea: Hi Christopher, judging from earlier posts you seem to have a lot of knowledge and insight about postgres. I've been trying to find out if there are any plans for snapshots (or summary tables) in postgres, but havent found any answers regarding this. Someone mentioned that the locking mechanism used by postgres makes it difficult to implement. Do you know if there is any truth in this? Also, difficult is not the same as impossible, so are there any plans for such? |
#7
| ||||
| ||||
|
|
Centuries ago, Nostradamus foresaw when lennart (AT) kommunicera (DOT) umea.se (Lennart Jonsson) would write: Christopher Browne <cbbrowne (AT) acm (DOT) org> wrote Centuries ago, Nostradamus foresaw when "Robert Klemme" <bob.news (AT) gmx (DOT) net> would write: "Steve" <nospam@nopes> schrieb im Newsbeitrag news:4032ea88$1 (AT) clarion (DOT) carno.net.au... Hi, |
|
Hi Christopher, judging from earlier posts you seem to have a lot of knowledge and insight about postgres. I've been trying to find out if there are any plans for snapshots (or summary tables) in postgres, but havent found any answers regarding this. Someone mentioned that the locking mechanism used by postgres makes it difficult to implement. Do you know if there is any truth in this? Also, difficult is not the same as impossible, so are there any plans for such? Can you elaborate by what you mean by that? First guess is that you're thinking of something akin to materialized views, perhaps "materialized aggregate views." The challenges with that aren't about locking mechanism, but rather with doing all the work about three times. (Once, in the source table, once, to figure out what to update in the "child" tables, and, in effect, once to actually put the updates into the "child" tables.) But it's quite possible that I am guessing badly. |
|
A coworker is working on things akin to this; the answers are still emerging. His approach has been to have "loader" programs in Java that generate summary information as it loads in the details. That amounts to doing the work twice. I have kept harping on the notion that the best idea is to start by generating views that calculate summary information, and tuning that when it turns out to be truly necessary to do so. You can't readily know, a priori, what parts are going to be slow and hence need optimization, so there's little sense in optimizing until you know what parts are slow and need it. |
|
That's certainly not the same thing as "locking problems," so it is possible that I'm on a different page... |
#8
| |||
| |||
|
|
In general I agree. Still it is much easier to transform a view into a summary table when there is a need for optimization, than to start mucking around with triggers and such to update "aggregate values" in "manual summary tables". The summary table in db2 can also be specified such that the optimizer can use it. Thus, by creating a summary table a critical query can run faster without rewriting it. |
#9
| |||
| |||
|
|
Clinging to sanity, lennart (AT) kommunicera (DOT) umea.se (Lennart Jonsson) mumbled into her beard: |
|
There's a project ongoing for this... http://gborg.postgresql.org/project/...rojdisplay.php Not clear that it will be "complete" terribly soon, but it at least provides a place to look for answers... |
#10
| |||
| |||
|
![]() |
| Thread Tools | |
| Display Modes | |
| |