Question about postmaster's CPU usage

Discussion:

(too old to reply)

kelphet xiong

2013-03-28 19:07:01 UTC

Hi all,

When I use postgres and issue a simple sequential scan for a table
inventory using query "select * from inventory;", I can see from "top" that
postmaster is using 100% CPU, which limits the query execution time. My
question is that, why CPU is the bottleneck here and what is postmaster
doing? Is there any way to improve the performance? Thanks!

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15401 postgres 20 0 4371m 517m 515m R 99.8 3.2 0:30.14 postmaster

Query: select * from inventory;

explain analyze select * from inventory;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------
------------
Seq Scan on inventory (cost=0.00..180937.00 rows=11745000 width=16)
(actual time=0.005..1030.403 rows=117450
00 loops=1)
Total runtime: 1750.889 ms
(2 rows)

Table "public.inventory"
Column | Type | Modifiers
----------------------+---------+-----------
inv_date_sk | integer | not null
inv_item_sk | integer | not null
inv_warehouse_sk | integer | not null
inv_quantity_on_hand | integer |
Indexes:
"inventory_pkey" PRIMARY KEY, btree (inv_date_sk, inv_item_sk,
inv_warehouse_sk)

Kevin Grittner

2013-03-28 21:03:42 UTC

Permalink

Post by kelphet xiong
When I use postgres and issue a simple sequential scan for a
table inventory using query "select * from inventory;", I can see
from "top" that postmaster is using 100% CPU, which limits the
query execution time. My question is that, why CPU is the
bottleneck here and what is postmaster doing? Is there any way to
improve the performance? Thanks!
explain analyze select * from inventory;
Seq Scan on inventory (cost=0.00..180937.00 rows=11745000 width=16) (actual time=0.005..1030.403 rows=11745000 loops=1)
Total runtime: 1750.889 ms

So it is reading and returning 11.7 million rows in about 1 second,
or about 88 nanoseconds (billionths of a second) per row. You
can't be waiting for a hard drive for many of those reads, or it
would take a lot longer, so the bottleneck is the CPU pushing the
data around in RAM. I'm not sure why 100% CPU usage would surprise
you. Are you wondering why the CPU works on the query straight
through until it is done, rather than taking a break periodically
and letting the unfinished work sit there?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

k***@rice.edu

2013-03-28 21:20:59 UTC

Permalink

Post by Kevin Grittner

So it is reading and returning 11.7 million rows in about 1 second,
or about 88 nanoseconds (billionths of a second) per row. You
can't be waiting for a hard drive for many of those reads, or it
would take a lot longer, so the bottleneck is the CPU pushing the
data around in RAM. I'm not sure why 100% CPU usage would surprise
you. Are you wondering why the CPU works on the query straight
through until it is done, rather than taking a break periodically
and letting the unfinished work sit there?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alternatively, purchase a faster CPU if CPU is the bottleneck as it
is in this case or partition the work into parallel queuries that can
each use a processor.

Regards,
Ken

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Kelphet Xiong

2013-03-31 04:00:02 UTC

Permalink

Thanks a lot for replies from Kevin, Ken, and Ants Aasma. I really
aappreciate your suggestions and comments.

My server configuration is two physical quad-core CPUs with
hyper-threading enabled.
Each CPU is Intel(R) Xeon(R) CPU ***@2.40GHz. Physical memory is 16GB.
I set shared_buffers as 4GB, effective_cache_size as 10GB and
inventory table is around 500MB.

From the information provided by top command, although the row for
postmaster shows that postmaster is using 100%CPU,
the total CPU user time for the whole server never goes beyond 6.6%us.
I guess it is because postgres only uses a single thread to read
the data or pushing the data around in RAM according to Kevins statement.
Then my question is actually why postgres can not use the remaining 93.4%CPU.

Btw, I also tried the command suggested by Ants Aasma, but got an error:
explain (analyze on, timing off) select * from inventory;
ERROR: syntax error at or near "analyze"
LINE 1: explain (analyze on, timing off) select * from inventory;

^

Thanks!

Best regards
Kelphet Xiong

Post by kelphet xiong

(actual time=0.005..1030.403 rows=11745000 loops=1)

Post by kelphet xiong
Total runtime: 1750.889 ms

So it is reading and returning 11.7 million rows in about 1 second,
or about 88 nanoseconds (billionths of a second) per row. You
can't be waiting for a hard drive for many of those reads, or it
would take a lot longer, so the bottleneck is the CPU pushing the
data around in RAM. I'm not sure why 100% CPU usage would surprise
you. Are you wondering why the CPU works on the query straight
through until it is done, rather than taking a break periodically
and letting the unfinished work sit there?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Merlin Moncure

2013-04-01 13:42:42 UTC

Permalink

Post by Kelphet Xiong
I guess it is because postgres only uses a single thread to read
the data or “pushing the data around in RAM” according to Kevin’s statement.
Then my question is actually why postgres can not use the remaining 93.4%CPU.

postgres can use an arbitrary amount of threads to read data, but only
one per database connection.

Post by Kelphet Xiong
explain (analyze on, timing off) select * from inventory;
ERROR: syntax error at or near "analyze"
LINE 1: explain (analyze on, timing off) select * from inventory;
^

Ability to manipulate timing was added in 9.2.

merlin

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Ants Aasma

2013-03-30 23:45:31 UTC

Permalink

Post by kelphet xiong
explain analyze select * from inventory;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------

Post by kelphet xiong
------------
Seq Scan on inventory (cost=0.00..180937.00 rows=11745000 width=16)

(actual time=0.005..1030.403 rows=117450

Post by kelphet xiong
00 loops=1)
Total runtime: 1750.889 ms
(2 rows)

A large fraction of that time, if not most is due to timing overhead. You
can try the same query without timing by using explain (analyze on, timing
off) select * from inventory;

Regards,
Ants Aasma