Discussion:
Insert performance for large transaction with multiple COPY FROM
(too old to reply)
Horst Dehmer
2013-01-11 23:55:36 UTC
Permalink
Hi!

I see a massive performance drop when writing a large transaction. I'm writing data for 33 tables with COPY FROM directly from streams in Scala/Java. Over all tables there are 2.2M records which are unevenly distributed from 1 record to 315k records in some tables.

For comparison I ran a test where I used UNLOGGED tables, no PK/FK constraints, nor other constraints or indexes and no triggers for all tables. The insert rate for this scenario is well above 105k records/second over all tables (which I think is really cool!)

Turning everything on (but still with UNLOGGED tables), i.e. PK/FK, additional indexes, some column check constraints and a trigger for each table which basically insert one additional record to another table, the rates dropped expectedly to around 6k to 7k records/second.

Except - and that's the wall I'm hitting - for one table which yielded just 75 records/second.
The main 'problem' seem to be the FK constraints. Dropping just them restored insert performance for this table to 6k records/s. The table in question has a composite PK (3 columns), 3 foreign keys and a bunch of indexes (see table obj_item_loc at the end of the mail). Compared to the other 32 tables nothing unusual.
I'd gladly supply more information if necessary.

Dropping and recreating constraints/indexes is (for now) no viable alternative, since I have to write such transaction into an already populated database.
What I'm trying to understand is, which limit it is I'm hitting here. I need some advice how to 'profile' this situation.

Configuration is more or less standard, except WAL settings (which should not be relevant here).
EnterpriseDB One Click installer.

Any hint is really appreciated.
Thanks!

--
Horst Dehmer



"PostgreSQL 9.2.1 on x86_64-apple-darwin, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00), 64-bit"
OS X 10.8.2
Mid-2012 MacBook Pro 16 GB, 512 GB SSD

"bytea_output";"escape"
"checkpoint_completion_target";"0.9"
"checkpoint_segments";"32"
"client_encoding";"UNICODE"
"client_min_messages";"notice"
"lc_collate";"en_US.UTF-8"
"lc_ctype";"en_US.UTF-8"
"listen_addresses";"*"
"log_checkpoints";"on"
"log_destination";"stderr"
"log_line_prefix";"%t "
"logging_collector";"on"
"max_connections";"100"
"max_stack_depth";"2MB"
"port";"5432"
"server_encoding";"UTF8"
"shared_buffers";"24MB"
"TimeZone";"Europe/Vienna"
"wal_buffers";"768kB"

/etc/sysctl.conf
kern.sysv.shmmax=1610612736
kern.sysv.shmall=393216
kern.sysv.shmmin=1
kern.sysv.shmmni=256
kern.sysv.shmseg=64
kern.maxprocperuid=512
kern.maxproc=2048

CREATE TABLE obj_item_loc
(
obj_item_id numeric(20,0) NOT NULL,
loc_id numeric(20,0) NOT NULL,
obj_item_loc_ix numeric(20,0) NOT NULL,
ver_acc_dim numeric(12,3),
horz_acc_dim numeric(12,3),
brng_angle numeric(7,4),
brng_acc_angle numeric(7,4),
brng_precision_code character varying(6),
incl_angle numeric(7,4),
incl_acc_angle numeric(7,4),
incl_precision_code character varying(6),
speed_rate numeric(8,4),
speed_acc_rate numeric(8,4),
speed_precision_code character varying(6),
meaning_code character varying(6),
rel_speed_code character varying(6),
rptd_id numeric(20,0) NOT NULL,
creator_id numeric(20,0) NOT NULL,
update_seqnr numeric(15,0) NOT NULL,
rec_id bigint DEFAULT nextval('rec_seq'::regclass),
CONSTRAINT obj_item_loc_pkey PRIMARY KEY (obj_item_id, loc_id, obj_item_loc_ix),
CONSTRAINT obj_item_loc_4fbc75641175ef1757ca310dd34e34ee_fkey FOREIGN KEY (obj_item_id)
REFERENCES obj_item (obj_item_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT obj_item_loc_7895d64f5557b1e382c36d41212a3696_fkey FOREIGN KEY (rptd_id)
REFERENCES rptd (rptd_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT obj_item_loc_8d919243f69bcc599873caca07ac9888_fkey FOREIGN KEY (loc_id)
REFERENCES loc (loc_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT obj_item_loc_brng_acc_angle_ra_check CHECK (br_check_305(brng_acc_angle >= 0::numeric AND brng_acc_angle <= 359.9999, 'obj_item_loc'::text, 'brng_acc_angle'::text, brng_acc_angle::text)),
CONSTRAINT obj_item_loc_brng_angle_ra_check CHECK (br_check_305(brng_angle >= 0::numeric AND brng_angle <= 359.9999, 'obj_item_loc'::text, 'brng_angle'::text, brng_angle::text)),
CONSTRAINT obj_item_loc_brng_precision_code_check CHECK (br_check_305(brng_precision_code::text = ANY (ARRAY['1000MN'::text, '100MN'::text, '100SEC'::text, '10DEG'::text, '10MN'::text, '10SEC'::text, 'DEGREE'::text, 'MIL'::text, 'MINUTE'::text, 'SECOND'::text]), 'obj_item_loc'::text, 'brng_precision_code'::text, brng_precision_code::text)),
CONSTRAINT obj_item_loc_incl_acc_angle_ra_check CHECK (br_check_305(incl_acc_angle >= 0::numeric AND incl_acc_angle <= 359.9999, 'obj_item_loc'::text, 'incl_acc_angle'::text, incl_acc_angle::text)),
CONSTRAINT obj_item_loc_incl_angle_ra_check CHECK (br_check_305(incl_angle >= 0::numeric AND incl_angle <= 359.9999, 'obj_item_loc'::text, 'incl_angle'::text, incl_angle::text)),
CONSTRAINT obj_item_loc_incl_precision_code_check CHECK (br_check_305(incl_precision_code::text = ANY (ARRAY['1000MN'::text, '100MN'::text, '100SEC'::text, '10DEG'::text, '10MN'::text, '10SEC'::text, 'DEGREE'::text, 'MIL'::text, 'MINUTE'::text, 'SECOND'::text]), 'obj_item_loc'::text, 'incl_precision_code'::text, incl_precision_code::text)),
CONSTRAINT obj_item_loc_meaning_code_check CHECK (br_check_305(meaning_code::text = ANY (ARRAY['CEOFMA'::text, 'SHAPE'::text, 'LNBRNG'::text, 'ASSCP'::text, 'COM'::text, 'CMDDET'::text, 'SOUND'::text, 'DSPCTR'::text, 'FRMCTR'::text, 'POSOIM'::text, 'CTRMNB'::text, 'ORGPRL'::text, 'STDPOS'::text, 'ADRPRP'::text]), 'obj_item_loc'::text, 'meaning_code'::text, meaning_code::text)),
CONSTRAINT obj_item_loc_rel_speed_code_check CHECK (br_check_305(rel_speed_code::text = ANY (ARRAY['FAST'::text, 'MEDIUM'::text, 'ZERO'::text, 'SLOW'::text]), 'obj_item_loc'::text, 'rel_speed_code'::text, rel_speed_code::text)),
CONSTRAINT obj_item_loc_speed_precision_code_check CHECK (br_check_305(speed_precision_code::text = ANY (ARRAY['KPH'::text, 'KNOTS'::text, 'MPS'::text]), 'obj_item_loc'::text, 'speed_precision_code'::text, speed_precision_code::text))
)
WITH (
OIDS=FALSE,
autovacuum_enabled=true
);
ALTER TABLE obj_item_loc
OWNER TO postgres;

CREATE INDEX obj_item_loc_ix_rec_id
ON obj_item_loc
USING btree
(rec_id);

CREATE INDEX obj_item_loc_loc_id_idx
ON obj_item_loc
USING btree
(loc_id);

CREATE INDEX obj_item_loc_obj_item_id_idx
ON obj_item_loc
USING btree
(obj_item_id);

CREATE INDEX obj_item_loc_rptd_id_idx
ON obj_item_loc
USING btree
(rptd_id);

CREATE TRIGGER trg_01_obj_item_loc_before_insert
BEFORE INSERT
ON obj_item_loc
FOR EACH ROW
EXECUTE PROCEDURE obj_item_loc_before_insert();
Jeff Janes
2013-01-12 01:17:40 UTC
Permalink
Post by Horst Dehmer
Except - and that's the wall I'm hitting - for one table which yielded
just 75 records/second.
The main 'problem' seem to be the FK constraints. Dropping just them
restored insert performance for this table to 6k records/s.
It sure sounds like you don't have enough RAM to hold the foreign-key table
data needed to check the constraints, so every insert needs one disk
revolution to fetch the data.

If you drop the indexes and constraints one at a time until it speeds up,
is there a certain one that is the culprit?

You can look in pg_statio_user_tables to see what tables and indexes have
high io being driven by the bulk loading.

Use "top" to see of the server is mostly IO bound or CPU bound.

Cheers,

Jeff
Horst Dehmer
2013-01-15 23:44:29 UTC
Permalink
After more testing I have gained some insights:

The foreign key constraints are NOT responsible for the low COPY FROM performance in my case. I forgot about the indexes which are created along with the FK constraints.

Besides the primary key

CONSTRAINT obj_item_loc_pkey PRIMARY KEY (obj_item_id, loc_id, obj_item_loc_ix),

the table OBJ_ITEM_LOC has four additional indexes (let's call them idx_1 through idx_4)

CREATE INDEX idx_1 ON obj_item_loc USING btree (rec_id);
CREATE INDEX idx_2 ON obj_item_loc USING btree (loc_id);
CREATE INDEX idx_3 ON obj_item_loc USING btree (rptd_id);
CREATE INDEX idx_4 ON obj_item_loc USING btree (obj_item_id);

The indexes 2 to 4 are intended to speed up joins between OBJ_ITEM_LOC and
LOC (loc_id), RPTD (rptd_id) and OBJ_ITEM (obj_item) respectively (and I'm highly suspicious if this makes sense at all.)

idx_4 together with a simple select in the tables on-insert trigger is slowing things down considerably.
With idx_4 and the trigger rates are

44100 rows, 0:00:04.576, 9637 r/s: LOC
2101 rows, 0:00:00.221, 9506 r/s: OBJ_ITEM
2101 rows, 0:00:00.278, 7557 r/s: ORG
94713 rows, 0:00:18.502, 5119 r/s: RPTD
44100 rows, 0:03:03.437, 240 r/s: OBJ_ITEM_LOC
imported 187115 record in 0:03:27.081 => 903 r/s

pg_statio comes up with same big numbers (reads = bad, hits = not so bad?):

relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
--------------+----------------+---------------+---------------+--------------
obj_item_loc | 1262 | 9908013 | 1199 | 1682005
rptd | 4434 | 279022 | 1806 | 1270746
org | 38 | 191559 | 19 | 201071
obj_item | 84 | 92476 | 29 | 104134
loc | 768 | 88902 | 597 | 352680
(5 rows)

Dropping idx_1, idx_2 and idx_3 at the same time has no significant impact. But take away idx_4 only:

44100 rows, 0:00:04.558, 9675 r/s: LOC
2101 rows, 0:00:00.220, 9593 r/s: OBJ_ITEM
2101 rows, 0:00:00.275, 7640 r/s: ORG
94713 rows, 0:00:18.407, 5145 r/s: RPTD
44100 rows, 0:00:11.433, 3857 r/s: OBJ_ITEM_LOC
imported 187115 record in 0:00:34.938 => 5355 r/s

Hm, not bad. Now for the select statement in the on insert trigger:

SELECT *
FROM obj_item_loc
WHERE obj_item_loc.obj_item_id = NEW.obj_item_id
AND obj_item_loc.loc_id = NEW.loc_id
AND obj_item_loc.obj_item_loc_ix = NEW.obj_item_loc_ix
INTO old;

Executing this query AFTER the bulk insert (and probably some auto-vacuuming) the query plan looks like this

explain analyze
select *
from obj_item_loc
where (obj_item_id, loc_id, obj_item_loc_ix) =
(10903011224100014650,10903010224100089226,10900024100000140894)

QUERY PLAN
--------------
Index Scan using obj_item_loc_loc_id_idx on obj_item_loc
(cost=0.00..8.36 rows=1 width=329)
(actual time=0.039..0.040 rows=1 loops=1)
Index Cond: (loc_id = 10903010224100089226::numeric)
Filter: ((obj_item_id = 10903011224100014650::numeric) AND
(obj_item_loc_ix = 10900024100000140894::numeric))
Total runtime: 0.079 ms

After some head-scratching I realized that obj_item_id is just referencing a meager 2101 rows which probably makes not for a good index candidate. So, the query plan make some sense, I guess.

Now I have some (more) questions:

1. How do I know which index (if any) is chosen for a select statement inside a trigger during a bulk load transaction? (or for that matter: a series of recursive plpgsql functions)
2. The query planner depends on stats collected by auto-vacuum/vacuum analyze, right? Does stats collecting also happen during a lengthy transaction?
3. Is it possible (or even advisable) to trigger vacuum analyze inside an ongoing transaction. Let's say load 10,000 rows of table A, analyze table A, insert the next 10,000 rows, analyze again, ...

I'm sorry if this is basic stuff I'm asking here, but especially point 2 is bothering me.

--
Kind regards
Horst Dehmer
Post by Horst Dehmer
Except - and that's the wall I'm hitting - for one table which yielded just 75 records/second.
The main 'problem' seem to be the FK constraints. Dropping just them restored insert performance for this table to 6k records/s.
It sure sounds like you don't have enough RAM to hold the foreign-key table data needed to check the constraints, so every insert needs one disk revolution to fetch the data.
If you drop the indexes and constraints one at a time until it speeds up, is there a certain one that is the culprit?
You can look in pg_statio_user_tables to see what tables and indexes have high io being driven by the bulk loading.
Use "top" to see of the server is mostly IO bound or CPU bound.
Cheers,
Jeff
Jeff Janes
2013-01-17 18:12:06 UTC
Permalink
Post by Horst Dehmer
idx_4 together with a simple select in the tables on-insert trigger is
slowing things down considerably.
So the theory is that the presence of idx_4 is causing the trigger to
pick a poor plan (i.e. one using idx_4) while its absence removes that
temptation?
True disk reads are much more expensive, but given how few reads you
have relative to hits, I now think that in aggregate the hits are more
of a concern than the reads are. In other words, you seem to be CPU
bound, not IO bound.

Even more so I think (but not with much confidence) that most of your
"reads" are actually coming from the OS cache and not from the disk.
PG cannot distinguish true disk reads from OS cache reads.

When was the last time you reset the stats? That is, are your
reported numbers accumulated over several loads, with some having idx4
and some not?

...
Post by Horst Dehmer
1. How do I know which index (if any) is chosen for a select statement
inside a trigger during a bulk load transaction? (or for that matter: a
series of recursive plpgsql functions)
Informally, reset your database to the state it was in before the
load, analyze it, and do the explain again before you do the load.

More formally, use use auto_explain and set
auto_explain.log_nested_statements to true. I haven't verified this
works with triggers, just going by the description I think it should.
Post by Horst Dehmer
2. The query planner depends on stats collected by auto-vacuum/vacuum
analyze, right? Does stats collecting also happen during a lengthy
transaction?
My understanding is that a transaction will not dump its stats until
the commit, so the auto analyze will not occur *due to* the lengthy
transaction until after it is over. But if the table was already due
for analyze anyway due to previous or concurrent shorter transactions,
the analyze will happen. However, the lengthy transaction might not
see the results of the analyze (I'm not clear on the transaction
snapshot semantics of the statistics tables) and even if it did see
them, it might just be using cached plans and so would not change the
plan in the middle.
Post by Horst Dehmer
3. Is it possible (or even advisable) to trigger vacuum analyze inside an
ongoing transaction. Let's say load 10,000 rows of table A, analyze table A,
insert the next 10,000 rows, analyze again, ...
You can't vacuum inside a transaction. You can analyze, but I don't
know if it would be advisable.

Your use case is a little unusual. If you are bulk loading into an
initially empty table, usually you would remove the trigger and add it
after the load (with some kind of bulk operation to make up for
whatever it was the trigger would have been doing). On the other
hand, if you are bulk loading into a "live" table and so can't drop
the trigger, then the live table should have good-enough preexisting
statistics to make the trigger choose a good plan.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Horst Dehmer
2013-01-18 23:15:15 UTC
Permalink
Hey Jeff (and others)!

First of all: Thanks for your detailed explanations and guide lines.
Post by Jeff Janes
So the theory is that the presence of idx_4 is causing the trigger to
pick a poor plan (i.e. one using idx_4) while its absence removes that
temptation?
Yes. And auto_explain confirms this for the first record (obj_item_loc_obj_item_id_idx = idx_4 from last my last mail):

2013-01-18 22:50:21 CET LOG: duration: 0.021 ms plan:
Query Text: SELECT * FROM obj_item_loc WHERE obj_item_loc.obj_item_id = NEW.obj_item_id AND obj_item_loc.loc_id = NEW.loc_id AND obj_item_loc.obj_item_loc_ix = NEW.obj_item_loc_ix
Index Scan using obj_item_loc_obj_item_id_idx on obj_item_loc (cost=0.00..8.27 rows=1 width=382)
Index Cond: (obj_item_id = $15)
Filter: ((loc_id = $16) AND (obj_item_loc_ix = $17))
2013-01-18 22:50:21 CET CONTEXT: SQL statement "SELECT * FROM obj_item_loc WHERE obj_item_loc.obj_item_id = NEW.obj_item_id AND obj_item_loc.loc_id = NEW.loc_id AND obj_item_loc.obj_item_loc_ix = NEW.obj_item_loc_ix"
PL/pgSQL function obj_item_loc_before_insert() line 5 at SQL statement
COPY obj_item_loc, line 1: "10903011224100007276 10903010224100015110 10900024100000029720 \N \N \N \N \N \N \N \N \N \N \N \N \..."

and for one of the last records:

2013-01-18 22:53:20 CET LOG: duration: 16.088 ms plan:
Query Text: SELECT * FROM obj_item_loc WHERE obj_item_loc.obj_item_id = NEW.obj_item_id AND obj_item_loc.loc_id = NEW.loc_id AND obj_item_loc.obj_item_loc_ix = NEW.obj_item_loc_ix
Index Scan using obj_item_loc_obj_item_id_idx on obj_item_loc (cost=0.00..8.27 rows=1 width=382)
Index Cond: (obj_item_id = $15)
Filter: ((loc_id = $16) AND (obj_item_loc_ix = $17))

I see a linear increase of the duration from 0.0x ms to over 16 ms (apart from a few nasty outliers with about 22 ms). Although even at the end there are still a few durations < 0.03 but mostly 15 ms and above.
Post by Jeff Janes
True disk reads are much more expensive, but given how few reads you
have relative to hits, I now think that in aggregate the hits are more
of a concern than the reads are. In other words, you seem to be CPU
bound, not IO bound.
Yes, definitely CPU bound, as top shows 99+% CPU utilization.
Post by Jeff Janes
Even more so I think (but not with much confidence) that most of your
"reads" are actually coming from the OS cache and not from the disk.
PG cannot distinguish true disk reads from OS cache reads.
When was the last time you reset the stats? That is, are your
reported numbers accumulated over several loads, with some having idx4
and some not?
I set up a fresh database before each test run. So the stats should be clean.
Post by Jeff Janes
More formally, use use auto_explain and set
auto_explain.log_nested_statements to true. I haven't verified this
works with triggers, just going by the description I think it should.
Nice tip! Works for triggers as well.
Post by Jeff Janes
Your use case is a little unusual. If you are bulk loading into an
initially empty table, usually you would remove the trigger and add it
after the load (with some kind of bulk operation to make up for
whatever it was the trigger would have been doing). On the other
hand, if you are bulk loading into a "live" table and so can't drop
the trigger, then the live table should have good-enough preexisting
statistics to make the trigger choose a good plan.
My case is indeed unusual as for the whole model of 276 tables there will never be an update nor a delete on any row.
The model is rather short-lived, from a few hours to a few months. COPY FROM/TO are the only ways to get data into the database and back out. And in between there is lots of graph traversal and calculation of convex hulls. But the lengthy transaction are by far not the common case.

Having said that, I'm no longer sure if a RDBMS is the right tool for the backend. Maybe indexing and storing with a plain full text search engine is. Dunno...

Thanks again!

--
Horst
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2013-01-12 01:17:59 UTC
Permalink
Post by Horst Dehmer
Except - and that's the wall I'm hitting - for one table which yielded just
75 records/second.
The main 'problem' seem to be the FK constraints. Dropping just them
restored insert performance for this table to 6k records/s. The table in
question has a composite PK (3 columns), 3 foreign keys and a bunch of
indexes (see table obj_item_loc at the end of the mail). Compared to the
other 32 tables nothing unusual.
I'd gladly supply more information if necessary.
...
Post by Horst Dehmer
CREATE TABLE obj_item_loc
(
obj_item_id numeric(20,0) NOT NULL,
loc_id numeric(20,0) NOT NULL,
obj_item_loc_ix numeric(20,0) NOT NULL,
That sounds a lot like a missing index on the target relations (or
indices that are unusable).

Those numeric ids look really unusual. Why not bigint? It's close to
the same precision, but native, faster, more compact, and quite
unambiguous when indices are involved. If the types don't match on
both tables, it's quite likely indices won't be used when checking the
FK, and that spells trouble.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Horst Dehmer
2013-01-12 20:16:02 UTC
Permalink
Yes, the ids is something I don't like either.
They carry additional semantics, which I cannot make go away.
How are chances char(20) is more time efficient than numeric(20)?
Disk space is no problem here.
Post by Claudio Freire
Post by Horst Dehmer
Except - and that's the wall I'm hitting - for one table which yielded just
75 records/second.
The main 'problem' seem to be the FK constraints. Dropping just them
restored insert performance for this table to 6k records/s. The table in
question has a composite PK (3 columns), 3 foreign keys and a bunch of
indexes (see table obj_item_loc at the end of the mail). Compared to the
other 32 tables nothing unusual.
I'd gladly supply more information if necessary.
...
Post by Horst Dehmer
CREATE TABLE obj_item_loc
(
obj_item_id numeric(20,0) NOT NULL,
loc_id numeric(20,0) NOT NULL,
obj_item_loc_ix numeric(20,0) NOT NULL,
That sounds a lot like a missing index on the target relations (or
indices that are unusable).
Those numeric ids look really unusual. Why not bigint? It's close to
the same precision, but native, faster, more compact, and quite
unambiguous when indices are involved. If the types don't match on
both tables, it's quite likely indices won't be used when checking the
FK, and that spells trouble.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2013-01-12 22:18:05 UTC
Permalink
Post by Horst Dehmer
Yes, the ids is something I don't like either.
They carry additional semantics, which I cannot make go away.
How are chances char(20) is more time efficient than numeric(20)?
Disk space is no problem here.
What are the other tables like then?

The exact data types involved are at issue here, so it matters.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Horst Dehmer
2013-01-12 22:41:53 UTC
Permalink
The types referenced by the foreign keys are the same Numeric(20).
Since the complete schema (of about 300 tables) is generated, I will just try char(20) instead of numeric(20) in the next days to see if it makes any difference. Which I somehow doubt.

But first I'm following the lead of the tables/indexes iostats given by Jeff.


obj_item_loc references the following three tables and there should be no surprises.

CREATE UNLOGGED TABLE loc
(
loc_id numeric(20,0) NOT NULL,
...
CONSTRAINT loc_pkey PRIMARY KEY (loc_id),

)

CREATE UNLOGGED TABLE obj_item
(
obj_item_id numeric(20,0) NOT NULL,
...
CONSTRAINT obj_item_pkey PRIMARY KEY (obj_item_id),

)

CREATE UNLOGGED TABLE rptd
(
rptd_id numeric(20,0) NOT NULL,
...
CONSTRAINT rptd_pkey PRIMARY KEY (rptd_id),

)
Post by Claudio Freire
Post by Horst Dehmer
Yes, the ids is something I don't like either.
They carry additional semantics, which I cannot make go away.
How are chances char(20) is more time efficient than numeric(20)?
Disk space is no problem here.
What are the other tables like then?
The exact data types involved are at issue here, so it matters.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2013-01-13 01:52:46 UTC
Permalink
Post by Horst Dehmer
Since the complete schema (of about 300 tables) is generated, I will just try char(20) instead of numeric(20) in the next days to see if it makes any difference. Which I somehow doubt.
I think that might just make it worse.

Well, maybe the others were right, and it's just that you're hitting
the disk on that particular table.

That, or it's all those CHECK constraints. Have you tried removing the
CHECK constraints (they're a heapload of function calls)
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2013-01-12 21:23:33 UTC
Permalink
Post by Claudio Freire
Post by Horst Dehmer
Except - and that's the wall I'm hitting - for one table which yielded just
75 records/second.
The main 'problem' seem to be the FK constraints. Dropping just them
restored insert performance for this table to 6k records/s. The table in
question has a composite PK (3 columns), 3 foreign keys and a bunch of
indexes (see table obj_item_loc at the end of the mail). Compared to the
other 32 tables nothing unusual.
I'd gladly supply more information if necessary.
...
Post by Horst Dehmer
CREATE TABLE obj_item_loc
(
obj_item_id numeric(20,0) NOT NULL,
loc_id numeric(20,0) NOT NULL,
obj_item_loc_ix numeric(20,0) NOT NULL,
That sounds a lot like a missing index on the target relations (or
indices that are unusable).
Those numeric ids look really unusual. Why not bigint? It's close to
the same precision, but native, faster, more compact, and quite
unambiguous when indices are involved. If the types don't match on
both tables, it's quite likely indices won't be used when checking the
FK, and that spells trouble.
Will PG allow you to add a FK constraint where there is no usable
index on the referenced side?

I have failed to do so, but perhaps I am not being devious enough.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Tom Lane
2013-01-12 22:26:51 UTC
Permalink
Post by Jeff Janes
Will PG allow you to add a FK constraint where there is no usable
index on the referenced side?
It will not, because the referenced side must have a unique constraint,
ie an index.

The standard performance gotcha here is not having an index on the
referencing side. But that only hurts when doing UPDATEs/DELETEs of
referenced-side keys, which as far as I gathered was not the OP's
scenario.

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...