Discussion:
Are bitmap index scans slow to start?
(too old to reply)
Carlo Stonebanks
2013-02-22 17:50:59 UTC
Permalink
Hi Jeff, thanks for the reply.



<<

What is going on during the interregnum? Whatever it is, it seems to be
driving the log_2013_01_session_idx index out of the cache, but not the
log_2013_01 table. (Or perhaps the table visit is getting the benefit of
effective_io_concurrency?)
.

Rebuilding the index might help, as it would put all the leaf pages holding
values for session_id=27 adjacent to each other, so they would read from
disk faster. But with a name like "session_id", I don't know how long such
clustering would last though.
Technically, nothing should be happening. We used to keep one massive audit
log, and was impossible to manage due to its size. We then changed to a
strategy where every month a new audit log would be spawned, and since
log_2013_01 represents January, the log should be closed and nothing should
have changed (it is technically possible that a long-running process would
spill over into February, but not by this much). So, assuming that it's
stable, it should be a very good candidate for reindexing, no?



Our effective_io_concurrency is 1, and last I heard the PG host was a LINUX
4 drive RAID10, so I don't know if there is any benefit to raising this
number - and if there was any benfit, it would be to the Bitmap Scan, and
the problem is the data building before the fact.
the bitmap itself doesn't get cached. But the data needed to construct
the bitmap does get cached. It gets cached by the generic caching methods
of PG and the OS, not through something specific to bitmaps.
<<



This has always been a problem for me. I spend hours trying different
strategies and think I've solved the problem, when in fact it seems like a
cache has spun up, and then something else expires it and the problem is
back. Is there a way around this problem, can I force the expiration of a
cache?



Carlo
Carlo Stonebanks
2013-02-22 17:50:59 UTC
Permalink
A cool idea, but if I understand it correctly very specific and fussy. New
DB's are spawned on this model, and all the developers would have to be
aware of this non-standard behaviour, and DBA"s would have to create these
indexes every month, for every DB (as the log tables are created every
month). There are 89 session_id values in the January log (log_2013_01) so
this would quickly get out of control. But - like I said - an interesting
idea for more specific challenges.



From: Marc Mamin [mailto:***@intershop.de]
Sent: February 21, 2013 2:41 PM
To: Jeff Janes; Carlo Stonebanks
Cc: pgsql-***@postgresql.org
Subject: AW: [PERFORM] Are bitmap index scans slow to start?
Post by Carlo Stonebanks
Rebuilding the index might help, as it would put all the leaf pages holding
values for session_id=27 adjacent to each other, so they would read from
disk faster. But with a name like >"session_id", I don't know how long such
clustering would last though.
Post by Carlo Stonebanks
If I'm right about the index disk-read time, then switching to a plain
index scan rather than a bitmap index scan would make no difference--either
way the data has to come off the disk.
Post by Carlo Stonebanks
I'd prefer a
strategy that allowed fast performance the first time, rather than slow
the
Post by Carlo Stonebanks
first time and extremely fast subsequently.
Hello,

if the index is only used to locate rows for single session_id, you may
consider split it in a set of partial indexes.

e.g.
create index i_0 on foo where session_id%4 =0;
create index i_1 on foo where session_id%4 =1;
create index i_2 on foo where session_id%4 =2;
create index i_3 on foo where session_id%4 =3;

(can be built in parallel using separate threads)

Then you will have to ensure that all your WHERE clauses also contain the
index condition:

WHERE session_id = 27 AND session_id%4 =27%4

regards,

Marc Mamin
Nikolas Everett
2013-02-22 19:05:02 UTC
Permalink
I can't really help, but I can make it more clear why postgres is choosing
a _bitmap_ index scan rather than a regular index scan. With a regular
index scan it pumps the index for the locations of the rows that it points
to and loads those rows as it finds them. This works great if the rows in
the index are sorta sorted - that way it isn't jumping around the table
randomly. Random io is slow. In a bitmap index scan pg pumps the index
and buffers the by shoving them in a big bitmap. Then, it walks the bitmap
in order to produce in order io. PG makes the choice based on a measure of
the index's correlation.

The problem comes down to you inserting the sessions concurrently with one
another. My instinct would be to lower the FILLFACTOR on newly created
indecies so they can keep their entries more in order. I'm not sure why I
have that instinct but it feels right. Also, you might could try
clustering newly created tables on session_id and setting the fillfactor
down so rows with the same session id will stick together on disk.

Now that I look stuff up on the internet I'm not sure where I saw that pg
tries to maintain a cluster using empty space from FILLFACTOR but I _think_
it does. I'm not sure what is going on with my google foo today.

Nik


On Fri, Feb 22, 2013 at 12:50 PM, Carlo Stonebanks <
Post by Carlo Stonebanks
A cool idea, but if I understand it correctly very specific and fussy. New
DB’s are spawned on this model, and all the developers would have to be
aware of this non-standard behaviour, and DBA”s would have to create these
indexes every month, for every DB (as the log tables are created every
month). There are 89 session_id values in the January log (log_2013_01) so
this would quickly get out of control. But – like I said – an interesting
idea for more specific challenges.****
** **
*Sent:* February 21, 2013 2:41 PM
*To:* Jeff Janes; Carlo Stonebanks
*Subject:* AW: [PERFORM] Are bitmap index scans slow to start?****
** **
** **
Post by Carlo Stonebanks
Rebuilding the index might help, as it would put all the leaf pages
holding values for session_id=27 adjacent to each other, so they would read
from disk faster. But with a name like >"session_id", I don't know how
long such clustering would last though.
Post by Carlo Stonebanks
If I'm right about the index disk-read time, then switching to a plain
index scan rather than a bitmap index scan would make no difference--either
way the data has to come off the disk.
****
Post by Carlo Stonebanks
I'd prefer a
strategy that allowed fast performance the first time, rather than slow
the
Post by Carlo Stonebanks
first time and extremely fast subsequently.****
Hello,
if the index is only used to locate rows for single session_id, you may
consider split it in a set of partial indexes.
e.g.
create index i_0 on foo where session_id%4 =0;
create index i_1 on foo where session_id%4 =1;
create index i_2 on foo where session_id%4 =2;
create index i_3 on foo where session_id%4 =3;
(can be built in parallel using separate threads)
Then you will have to ensure that all your WHERE clauses also contain the
WHERE session_id = 27 AND session_id%4 =27%4
regards,
Marc Mamin****
Carlo Stonebanks
2013-02-22 20:05:34 UTC
Permalink
Post by Carlo Stonebanks
Also, you might could try clustering newly created tables on session_id
and setting the fillfactor down so rows with the same session id will stick
together on disk.

<<



My understanding of PG's cluster is that this is a one-time command that
creates a re-ordered table and doesn't maintain the clustered order until
the command is issued again. During the CLUSTER, the table is read and write
locked. So, in order for me to use this I would need to set up a timed event
to CLUSTER occasionally.
Post by Carlo Stonebanks
I can't really help, but I can make it more clear why postgres is
choosing a _bitmap_ index scan rather than a regular index scan

<<



The EXPLAIN ANALYZE is showing it is taking a long time to prepare the
bitmap (i.e.-> Bitmap Index Scan on log_2013_01_session_idx
(cost=0.00..63186.52

rows=2947664 width=0) (actual time=32611.918..32611.918 rows=2772042
loops=1)" Index Cond: (session_id = 27)" the bitmap scan is actually very
fast. Jeff sasys that the bitmap is not cached, so I will assume the PG
general caches being created are of general use.



I think what I need to do is figure out is:



1) Why does it take 36 seconds to set up the general index caches?

2) What can I do about it (what stats do I need to look at)?

3) How can I force these caches to expire so I can tell if the strategy
worked?









From: Nikolas Everett [mailto:***@gmail.com]
Sent: February 22, 2013 2:05 PM
To: Carlo Stonebanks
Cc: Marc Mamin; Jeff Janes; pgsql-***@postgresql.org
Subject: Re: [PERFORM] Are bitmap index scans slow to start?



I can't really help, but I can make it more clear why postgres is choosing a
_bitmap_ index scan rather than a regular index scan. With a regular index
scan it pumps the index for the locations of the rows that it points to and
loads those rows as it finds them. This works great if the rows in the
index are sorta sorted - that way it isn't jumping around the table
randomly. Random io is slow. In a bitmap index scan pg pumps the index and
buffers the by shoving them in a big bitmap. Then, it walks the bitmap in
order to produce in order io. PG makes the choice based on a measure of the
index's correlation.



The problem comes down to you inserting the sessions concurrently with one
another. My instinct would be to lower the FILLFACTOR on newly created
indecies so they can keep their entries more in order. I'm not sure why I
have that instinct but it feels right. Also, you might could try clustering
newly created tables on session_id and setting the fillfactor down so rows
with the same session id will stick together on disk.



Now that I look stuff up on the internet I'm not sure where I saw that pg
tries to maintain a cluster using empty space from FILLFACTOR but I _think_
it does. I'm not sure what is going on with my google foo today.



Nik



On Fri, Feb 22, 2013 at 12:50 PM, Carlo Stonebanks
<***@sympatico.ca> wrote:

A cool idea, but if I understand it correctly very specific and fussy. New
DB's are spawned on this model, and all the developers would have to be
aware of this non-standard behaviour, and DBA"s would have to create these
indexes every month, for every DB (as the log tables are created every
month). There are 89 session_id values in the January log (log_2013_01) so
this would quickly get out of control. But - like I said - an interesting
idea for more specific challenges.



From: Marc Mamin [mailto:***@intershop.de]
Sent: February 21, 2013 2:41 PM
To: Jeff Janes; Carlo Stonebanks
Cc: pgsql-***@postgresql.org
Subject: AW: [PERFORM] Are bitmap index scans slow to start?
Post by Carlo Stonebanks
Rebuilding the index might help, as it would put all the leaf pages holding
values for session_id=27 adjacent to each other, so they would read from
disk faster. But with a name like >"session_id", I don't know how long such
clustering would last though.
Post by Carlo Stonebanks
If I'm right about the index disk-read time, then switching to a plain
index scan rather than a bitmap index scan would make no difference--either
way the data has to come off the disk.
Post by Carlo Stonebanks
I'd prefer a
strategy that allowed fast performance the first time, rather than slow
the
Post by Carlo Stonebanks
first time and extremely fast subsequently.
Hello,

if the index is only used to locate rows for single session_id, you may
consider split it in a set of partial indexes.

e.g.
create index i_0 on foo where session_id%4 =0;
create index i_1 on foo where session_id%4 =1;
create index i_2 on foo where session_id%4 =2;
create index i_3 on foo where session_id%4 =3;

(can be built in parallel using separate threads)

Then you will have to ensure that all your WHERE clauses also contain the
index condition:

WHERE session_id = 27 AND session_id%4 =27%4

regards,

Marc Mamin
Jeff Janes
2013-02-23 17:45:55 UTC
Permalink
My understanding of PG’s cluster is that this is a one-time command that
creates a re-ordered table and doesn’t maintain the clustered order until
the command is issued again. During the CLUSTER, the table is read and
write locked. So, in order for me to use this I would need to set up a
timed event to CLUSTER occasionally.
Correct.
** **
The EXPLAIN ANALYZE is showing it is taking a long time to prepare the
bitmap (i.e.-> Bitmap Index Scan on log_2013_01_session_idx
(cost=0.00..63186.52****
rows=2947664 width=0) (actual time=32611.918..32611.918 rows=2772042
loops=1)" Index Cond: (session_id = 27)" the bitmap scan is actually very
fast. Jeff sasys that the bitmap is not cached, so I will assume the PG
general caches being created are of general use.
To clarify the "actual time" thing, the first number is not when the node
received its first row from its downstream (or when the node was started,
if it has no downstream). I believe that that number is when the node
produced its first row to send upstream, and 2nd number is when it produced
its last row. Since a bitmap index scan only produces one "row" (the
bitmap itself), these number will always be the same. In other words, the
"actual time" field does not give you measure of the start-up time of the
node. Indeed, there is no easy way to figure that out from the output of
EXPLAIN. Or at least this is my understanding from trial and error, this
doesn't seem to be documented anywhere.

What tells you that the bitmap creation is fast is that it gets much
faster when run on already-cached data, so the time is going to reading in
data, not turning the data into the bitmap.
****
** **
I think what I need to do is figure out is:****
** **
**1) **Why does it take 36 seconds to set up the general index
caches?
They are not general index caches, just general data caches. The index
pages compete with all the other data in the system. Anyway, running the
explains as "explain (analyze, buffers)" would go a long way towards
figuring out why it takes so long to read the index, especially if you can
set track_io_timing = on first.

And then the next question would be, once they are in the cache, why don't
they stay there? For that you would have to know what other types of
activities are going on that might be driving the data out of the cache.

Cheers,

Jeff
Carlo Stonebanks
2013-02-25 17:04:19 UTC
Permalink
Hi Jeff, thanks for the insight.



<< And then the next question would be, once they are in the cache, why
don't they stay there? For that you would have to know what other types of
activities are going on that might be driving the data out of the cache.
To give you an idea of the activity level, each physical machine hosts
multiple DB's with the same structure - one DB per client.



We run automated ETL processes which digests client feeds (E) normalizes
them (T) and then stores them in our DB (L).



Looking at the stats from our audit log, the average feed load is 4 hours,
divided up into 14 client sessions. Each session averages about 50 write
(update, insert, no deletes) operations per second, representing 700 write
operations per second. The ratio of reads per write is pretty high as the
system goes through the transformation process.



Since I don't know how this compares to other PG installations, the question
of using periodic REINDEX and CLUSTER brings up these questions:



1) Because we are hosting multiple DB's, what is the impact on OS and
disk caches?

2) Is there an automated CLUSTER and REINDEX strategy that will not
interfere with normal operations?

3) By PG standards, is this a busy DB - and does explain why the
general caches expire?



Thanks,



Carlo
Jeff Janes
2013-02-26 18:11:57 UTC
Permalink
On Mon, Feb 25, 2013 at 9:04 AM, Carlo Stonebanks <
Hi Jeff, thanks for the insight.****
** **
<< And then the next question would be, once they are in the cache, why
don't they stay there? For that you would have to know what other types of
activities are going on that might be driving the data out of the cache.**
**
** **
** **
To give you an idea of the activity level, each physical machine hosts
multiple DB’s with the same structure – one DB per client.****
** **
We run automated ETL processes which digests client feeds (E) normalizes
them (T) and then stores them in our DB (L).****
** **
Looking at the stats from our audit log, the average feed load is 4 hours,
divided up into 14 client sessions. Each session averages about 50 write
(update, insert, no deletes) operations per second, representing 700 write
operations per second.
Is each of these write operations just covering a single row? Does this
description apply to just one of the many (how many?) databases, so that
there are really 14*N concurrent sessions?
The ratio of reads per write is pretty high as the system goes through the
transformation process.****
** **
Since I don’t know how this compares to other PG installations, the
question of using periodic REINDEX and CLUSTER brings up these questions:*
***
** **
**1) **Because we are hosting multiple DB’s, what is the impact on
OS and disk caches?
They have to share the RAM. One strategy would be run ETL processes only
one at a time, rather than trying to run several concurrently, if that is
what you are doing. That way you can concentrate one customers data in
RAM, and then another's, to reduce the competition.
****
**2) **Is there an automated CLUSTER and REINDEX strategy that will
not interfere with normal operations?****
**3) **By PG standards, is this a busy DB - and does explain why the
general caches expire?
You really need to know whether those reads and writes are concentrated in
a small region (relative to the amount of your RAM), or widely scattered.
If you are reading and writing intensively (which you do seem to be doing)
but only within a compact region, then it should not drive other data out
of the cache. But, since you do seem to have IO problems from cache
misses, and you do have a high level of activity, the easy conclusion is
that you have too little RAM to hold the working size of your data.

Cheers,

Jeff
Carlo Stonebanks
2013-02-27 00:33:13 UTC
Permalink
<<Is each of these write operations just covering a single row? Does this
description apply to just one of the many (how many?) databases, so that
there are really 14*N concurrent sessions?
All writes are single row. All DB's have exactly the same structure, only
the content is different. Currently the server is hosting five active DB's -
although there 14 DB's actually on the host, the balance are backups and or
testing environments.



When a feed comes in, it can be anything from dozens to millions of rows,
and may take minutes or days to run. I had asked that PG bouncer be
installed in front of the host to act as a traffic cop. Try as I may to
convince the engineering team that fewer sessions running faster is optimal,
they say that the 14 concurrent sessions is based on real-world experience
of what imports the fastest.



<< You really need to know whether those reads and writes are concentrated
in a small region (relative to the amount of your RAM), or widely scattered.
If you are reading and writing intensively (which you do seem to be doing)
but only within a compact region, then it should not drive other data out of
the cache. But, since you do seem to have IO problems from cache misses,
and you do have a high level of activity, the easy conclusion is that you
have too little RAM to hold the working size of your data.
It won't be a problem of physical RAM, I believe there is at least 32GB of
RAM. What constitutes "a compact region"? The ETL process takes the feed and
distributes it to 85 core tables. I have been through many PG configuration
cycles with the generous help of people in this forum. I think the big
problem when getting help has been this issue of those offering assistance
understanding that the whopping majority of the time, the system is
performing single row reads and writes. The assumption tends to be that the
end point of an ETL should just be a series of COPY statements, and it
should all happen very quickly in classic SQL bulk queries.
Jeff Janes
2013-02-27 18:03:50 UTC
Permalink
On Tue, Feb 26, 2013 at 4:33 PM, Carlo Stonebanks <
Post by Carlo Stonebanks
<<Is each of these write operations just covering a single row? Does this
description apply to just one of the many (how many?) databases, so that
there are really 14*N concurrent sessions?****
** **
** **
All writes are single row. All DB’s have exactly the same structure, only
the content is different. Currently the server is hosting five active DB’s
– although there 14 DB’s actually on the host, the balance are backups and
or testing environments.
I had thought you were saying that any one ETL procedure into one database
used 14 concurrent threads. But really, each ETL procedure is
single-threaded, and there can be up to 5 (or theoretically up to 14) of
them running at a time into different databases?
Post by Carlo Stonebanks
When a feed comes in, it can be anything from dozens to millions of rows,
and may take minutes or days to run. I had asked that PG bouncer be
installed in front of the host to act as a traffic cop. Try as I may to
convince the engineering team that fewer sessions running faster is
optimal, they say that the 14 concurrent sessions is based on real-world
experience of what imports the fastest.
pgbouncer is more for making connections line up single-file when the line
is moving at a very fast clip, say 0.01 second per turn. If I were trying
to make tasks that can each last for hours or days line up and take turns,
I don't think pgbouncer would be the way to go.
Post by Carlo Stonebanks
****
** **
<< You really need to know whether those reads and writes are
concentrated in a small region (relative to the amount of your RAM), or
widely scattered. If you are reading and writing intensively (which you do
seem to be doing) but only within a compact region, then it should not
drive other data out of the cache. But, since you do seem to have IO
problems from cache misses, and you do have a high level of activity, the
easy conclusion is that you have too little RAM to hold the working size of
your data.
****
** **
It won’t be a problem of physical RAM, I believe there is at least 32GB of
RAM. What constitutes “a compact region”?
If you have 14 actively going on simultaneously, I'd say a compact region
would then be about 512 MB.
(32GB/ 14 / margin of safety of 4). Again, assuming that that is the
problem.
Post by Carlo Stonebanks
The ETL process takes the feed and distributes it to 85 core tables. I
have been through many PG configuration cycles with the generous help of
people in this forum. I think the big problem when getting help has been
this issue of those offering assistance understanding that the whopping
majority of the time, the system is performing single row reads and writes.
The assumption tends to be that the end point of an ETL should just be a
series of COPY statements, and it should all happen very quickly in classic
SQL bulk queries.****
That is often a reasonable assumption, as ETL does end with L :)

Is the original query you posted part of the transform process, rather than
being the production query you run after the ETL is over?

If so, maybe you need a EL(S)TL process, were you first load the data to
staging table in bulk, and then transform it in bulk rather than one row at
a time.

Cheers,

Jeff
Carlo Stonebanks
2013-02-27 21:38:29 UTC
Permalink
<< pgbouncer is more for making connections line up single-file when the
line is moving at a very fast clip, say 0.01 second per turn. If I were
trying to make tasks that can each last for hours or days line up and take
turns, I don't think pgbouncer would be the way to go.
The recommendation at the time was assuming that write contention was
slowing things down and consuming resources, since I can't stop people from
creating big multi-threaded imports. Each import consists of about 50 writes
Post by Jeff Janes
Is the original query you posted part of the transform process, rather
than being the production query you run after the ETL is over?



Neither, it is part of our auditing and maintenance processes. It is not
called with any great frequency. The audit report generates rows defining
how the a particular item (an "item" being a particular table/row) was
created: it returns the names of the import tables, the row ids, the write
operations and any transformation messages that may have been generated -
all in the order they occurred.



You can imagine how useful this in creating a document describing what
happened and why.



The same data generated by the report is used to "resurrect" an item. If -
for example - our business logic has changed, but the change only affects a
small sub-set of our core data, then we perform a "rollback" (a logical
cascading delete) on the affected items. Then we create a "rebuild" which is
a script that is generated to re-import ONLY the import table rows defined
in the audit report.



So, this query is not called often, but the fact is that if it takes over 30
seconds to load an item (because the audit report takes so long to prepare
the bitmap index scan when passed new query parameters) then it severely
restricts how much data we can resurrect at any one time.
Jeff Janes
2013-03-05 21:20:32 UTC
Permalink
On Wed, Feb 27, 2013 at 1:38 PM, Carlo Stonebanks <
Post by Carlo Stonebanks
Post by Jeff Janes
Is the original query you posted part of the transform process, rather
than being the production query you run after the ETL is over?
****
Neither, it is part of our auditing and maintenance processes. It is not
called with any great frequency. The audit report generates rows defining
how the a particular item (an “item” being a particular table/row) was
created: it returns the names of the import tables, the row ids, the write
operations and any transformation messages that may have been generated –
all in the order they occurred.****
** **
...
Post by Carlo Stonebanks
**
So, this query is not called often, but the fact is that if it takes over
30 seconds to load an item (because the audit report takes so long to
prepare the bitmap index scan when passed new query parameters) then it
severely restricts how much data we can resurrect at any one time.****
Is that a restriction you have observed, or are you extrapolating based on
a single query? If you run a bunch of similar queries in close succession,
it is likely that the first few queries will warm up the cache, and
following queries will then run much faster. Also, if you restructure the
series of queries into a large one that reconstructs many rows
simultaneously, it might choose a more efficient path than if it is fed the
queries one at a time.

Cheers,

Jeff
Carlo Stonebanks
2013-03-08 21:27:54 UTC
Permalink
Sorry this took so long to get back to you. Here is where we were:

I said: <<

So, this query is not called often, but the fact is that if it takes over 30
seconds to load an item (because the audit report takes so long to prepare
the bitmap index scan when passed new query parameters) then it severely
restricts how much data we can resurrect at any one time.
Your reply: <<
Is that a restriction you have observed, or are you extrapolating based on a
single query? If you run a bunch of similar queries in close succession, it
is likely that the first few queries will warm up the cache, and following
queries will then run much faster. Also, if you restructure the series of
queries into a large one that reconstructs many rows simultaneously, it
might choose a more efficient path than if it is fed the queries one at a
time.
Actual observation. The first run with a new parameter actually takes 90
seconds. Another run with the same parameter takes 15-30 seconds. Running
the query immediately afterwards with different parameters starts with a new
90 seconds query. Unfortunately, since going to LINUX, our sys ops hiss and
snarl at anyone who comes anywhere near machine or DB server configs, so I
am no longer well informed on how well optimized the machines are.



Ultimately, the machines need to be optimized by an expert. As I mentioned
before, our ETL is entirely single-load reads-and-writes (I didn't go into
the "why" of this because the nature of the data and the product dictates
this). And this is an example of one of the few complex joins that return
hundreds/thousands of rows. The problem is that a full index scan has to be
done before we can start building the results. So, if clustering will help
such that the index scan KNOWS that there's no point is scanning the rest of
the index because we've gone beyond the maximum value in the list of
possible values, then that would help, as each table being scanned has 50 -
100 million rows (there is one table for every month of production).



As always, thanks.



From: Jeff Janes [mailto:***@gmail.com]
Sent: March 5, 2013 4:21 PM
To: Carlo Stonebanks
Cc: pgsql-***@postgresql.org
Subject: Re: [PERFORM] Are bitmap index scans slow to start?



On Wed, Feb 27, 2013 at 1:38 PM, Carlo Stonebanks
Post by Jeff Janes
Is the original query you posted part of the transform process, rather
than being the production query you run after the ETL is over?

Neither, it is part of our auditing and maintenance processes. It is not
called with any great frequency. The audit report generates rows defining
how the a particular item (an "item" being a particular table/row) was
created: it returns the names of the import tables, the row ids, the write
operations and any transformation messages that may have been generated -
all in the order they occurred.




...



So, this query is not called often, but the fact is that if it takes over 30
seconds to load an item (because the audit report takes so long to prepare
the bitmap index scan when passed new query parameters) then it severely
restricts how much data we can resurrect at any one time.


Is that a restriction you have observed, or are you extrapolating based on a
single query? If you run a bunch of similar queries in close succession, it
is likely that the first few queries will warm up the cache, and following
queries will then run much faster. Also, if you restructure the series of
queries into a large one that reconstructs many rows simultaneously, it
might choose a more efficient path than if it is fed the queries one at a
time.

Cheers,

Jeff

Carlo Stonebanks
2013-02-28 05:12:48 UTC
Permalink
<<

I had thought you were saying that any one ETL procedure into one database
used 14 concurrent threads. But really, each ETL procedure is
single-threaded, and there can be up to 5 (or theoretically up to 14) of
them running at a time into different databases?
Sorry, just caught this.



Your first interpretation was correct. Each DB runs an ETL that can have up
to 14 concurrent threads. I don't think the number should be that high, but
the engineering team insists the load time is better than fewer threads
running faster.
Jeff Janes
2013-02-23 16:15:12 UTC
Permalink
Hi Jeff, thanks for the reply.****
** **
<<** **
What is going on during the interregnum? Whatever it is, it seems to be
driving the log_2013_01_session_idx index out of the cache, but not the
log_2013_01 table. (Or perhaps the table visit is getting the benefit of
effective_io_concurrency?)
…****
Rebuilding the index might help, as it would put all the leaf pages
holding values for session_id=27 adjacent to each other, so they would read
from disk faster. But with a name like "session_id", I don't know how long
such clustering would last though.****
** **
** **
Technically, nothing should be happening. We used to keep one massive
audit log, and was impossible to manage due to its size. We then changed to
a strategy where every month a new audit log would be spawned, and since
log_2013_01 represents January, the log should be closed and nothing should
have changed (it is technically possible that a long-running process would
spill over into February, but not by this much). So, assuming that it’s
stable, it should be a very good candidate for reindexing, no?
Yes, assuming the problem is reading the index data from disk, that sounds
like a good candidate for reindexing (and maybe clustering as well).
****
** **
Our effective_io_concurrency is 1, and last I heard the PG host was a
LINUX 4 drive RAID10, so I don’t know if there is any benefit to raising
this number – and if there was any benfit, it would be to the Bitmap Scan,
and the problem is the data building before the fact.****
** **
the bitmap itself doesn't get cached. But the data needed to
construct the bitmap does get cached. It gets cached by the generic
caching methods of PG and the OS, not through something specific to bitmaps.
<<****
** **
This has always been a problem for me. I spend hours trying different
strategies and think I’ve solved the problem, when in fact it seems like a
cache has spun up, and then something else expires it and the problem is
back. Is there a way around this problem, can I force the expiration of a
cache?
You can clear the PG cache by restarting the instance. To clear the OS
cache as well you can do this (Linux)

<stop postgres>
sync
sudo echo 3 > /proc/sys/vm/drop_caches
<start postgres>


But I think it would be better just not to execute the same query
repeatedly. For example, each time you execute it during testing, pick a
different session_id rather than using 27 repeatedly. (It might also be a
good idea to change up the hard-coded in-list values you have, but with the
plans you are currently seeing that isn't important as those are being used
in a filter not a look-up)

Cheers,

Jeff
Carlo Stonebanks
2013-02-28 20:13:50 UTC
Permalink
<<Could you use CLUSTER on the table after it had been closed off? If
appropriate, that should make the queries run much faster, as elated entries
will be in the same or nearby blocks on disk.
Technically, yes. That would really help, but the issue is scheduling.
Although the logs are closed off for writes, they aren't closed off for
reads, ref PG documentation: "When a table is being clustered, an ACCESS
EXCLUSIVE lock is acquired on it. This prevents any other database
operations (both reads and writes) from operating on the table until the
CLUSTER is finished."



Not ideal, but a lot better than doing nothing at all!
Jeff Janes
2013-03-05 17:46:34 UTC
Permalink
On Thu, Feb 28, 2013 at 12:13 PM, Carlo Stonebanks <
Post by Carlo Stonebanks
<<Could you use CLUSTER on the table after it had been closed off? If
appropriate, that should make the queries run much faster, as elated
entries will be in the same or nearby blocks on disk.****
** **
** **
Technically, yes. That would really help, but the issue is scheduling.
Although the logs are closed off for writes, they aren’t closed off for
reads, ref PG documentation: “When a table is being clustered, an ACCESS
EXCLUSIVE lock is acquired on it. This prevents any other database
operations (both reads and writes) from operating on the table until the
CLUSTER is finished.”****
** **
Not ideal, but a lot better than doing nothing at all!
Since it is read only, you could make a copy of the table, cluster the copy
(or just do the sorting while you make the copy), and then atomically swap
the two tables by renaming them inside a single transaction.

The swap process will take an exclusive lock, but it will only last for a
fraction of second rather than the duration of the clustering operation.

Cheers,

Jeff
Loading...