Gunnlaugur Þór Briem
2014-11-20 11:59:31 UTC
Hi,
we have seen several instances of EXPLAIN INSERT INTO foo SELECT ... taking very long (51 minutes this morning), with disk I/O at 100% the whole time. What could cause this?
Interestingly, the actual execution logs a "temporary file" message, but the EXPLAIN execution does not (though the disk I/O suggests that a large on-disk sort is occurring):
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp20857.0", size 1038729216
STATEMENT: INSERT INTO f_foo
SELECT [...]
There's much less disk I/O during the actual execution, maybe because much of the table is now in the page cache.
Here's the EXPLAIN query:
LOG: duration: 3079470.551 ms STATEMENT: EXPLAIN INSERT INTO f_foo
SELECT fk_d1, fk_d2, fk_d3, fk_d4, fk_d5, fk_d6, value
FROM t_foo
WHERE NOT EXISTS (SELECT *
FROM f_foo
WHERE f_foo.fk_d1 = t_foo.fk_d1
AND f_foo.fk_d2 = t_foo.fk_d2
AND f_foo.fk_d3 = t_foo.fk_d3
AND f_foo.fk_d4 = t_foo.fk_d4
AND f_foo.fk_d5 = t_foo.fk_d5
AND f_foo.fk_d6 = t_foo.fk_d6)
(where t_foo is a temp table previously populated using COPY)
Here's the EXPLAIN output:
Insert on f_foo (cost=12916290.47..14976985.88 rows=1 width=28)
-> Merge Anti Join (cost=12916290.47..14976985.88 rows=1 width=28)
Merge Cond: ((t_foo.fk_d1 = public.f_foo.fk_d1) AND (t_foo
-> Sort (cost=3846835.11..3915105.75 rows=27308256 width=28)
Sort Key: t_foo.fk_d1, t_foo.fk_d2, t_foo
-> Seq Scan on t_foo (cost=0.00..473878.56 rows=27308256 width=28)
-> Materialize (cost=9069455.35..9342037.71 rows=54516472 width=24)
-> Sort (cost=9069455.35..9205746.53 rows=54516472 width=24)
Sort Key: public.f_foo.fk_d1, public.f_foo.fk_d2, public.f_foo
-> Seq Scan on f_foo (cost=0.00..946021.72 rows=54516472 width=24)
The INSERT is indeed rather large (which is why we're issuing an EXPLAIN ahead of it to log the plan). So it's normal that the subsequent actual execution of INSERT takes that long. But I want to understand why the EXPLAIN would.
The table looks like this:
\d f_foo
Table "public.f_foo"
Column | Type | Modifiers
--------+---------+-----------
fk_d1 | integer | not null
fk_d2 | integer | not null
fk_d3 | integer | not null
fk_d4 | integer | not null
fk_d5 | integer | not null
fk_d6 | integer | not null
value | integer | not null
Indexes:
"f_foo_pkey" PRIMARY KEY, btree (fk_d1, fk_d5, fk_d3, fk_d2, fk_d4, fk_d6) CLUSTER
"ix_f_177_d4" btree (fk_d4)
"ix_f_177_d2" btree (fk_d2)
"ix_f_177_d3" btree (fk_d3)
Foreign-key constraints:
"f_177_d5_fkey" FOREIGN KEY (fk_d5) REFERENCES f_foo_d5(id) DEFERRABLE
"f_177_d1_fkey" FOREIGN KEY (fk_d1) REFERENCES f_foo_d1(id) DEFERRABLE
"f_177_d4_fkey" FOREIGN KEY (fk_d4) REFERENCES f_foo_d4(id) DEFERRABLE
"f_177_d3_fkey" FOREIGN KEY (fk_d3) REFERENCES f_foo_d3(id) DEFERRABLE
"f_177_d2_fkey" FOREIGN KEY (fk_d2) REFERENCES f_foo_d2(id) DEFERRABLE
Conceivably relevant (though I don't know how): this database has a very large number of table objects (2.2 million rows in pg_class). But other EXPLAINs are not taking anywhere near this long in this DB, it's only seen on INSERT into this and a couple of other tens-of-millions-of-rows tables.
Any pointers?
Thanks, best regards,
- Gulli
we have seen several instances of EXPLAIN INSERT INTO foo SELECT ... taking very long (51 minutes this morning), with disk I/O at 100% the whole time. What could cause this?
Interestingly, the actual execution logs a "temporary file" message, but the EXPLAIN execution does not (though the disk I/O suggests that a large on-disk sort is occurring):
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp20857.0", size 1038729216
STATEMENT: INSERT INTO f_foo
SELECT [...]
There's much less disk I/O during the actual execution, maybe because much of the table is now in the page cache.
Here's the EXPLAIN query:
LOG: duration: 3079470.551 ms STATEMENT: EXPLAIN INSERT INTO f_foo
SELECT fk_d1, fk_d2, fk_d3, fk_d4, fk_d5, fk_d6, value
FROM t_foo
WHERE NOT EXISTS (SELECT *
FROM f_foo
WHERE f_foo.fk_d1 = t_foo.fk_d1
AND f_foo.fk_d2 = t_foo.fk_d2
AND f_foo.fk_d3 = t_foo.fk_d3
AND f_foo.fk_d4 = t_foo.fk_d4
AND f_foo.fk_d5 = t_foo.fk_d5
AND f_foo.fk_d6 = t_foo.fk_d6)
(where t_foo is a temp table previously populated using COPY)
Here's the EXPLAIN output:
Insert on f_foo (cost=12916290.47..14976985.88 rows=1 width=28)
-> Merge Anti Join (cost=12916290.47..14976985.88 rows=1 width=28)
Merge Cond: ((t_foo.fk_d1 = public.f_foo.fk_d1) AND (t_foo
-> Sort (cost=3846835.11..3915105.75 rows=27308256 width=28)
Sort Key: t_foo.fk_d1, t_foo.fk_d2, t_foo
-> Seq Scan on t_foo (cost=0.00..473878.56 rows=27308256 width=28)
-> Materialize (cost=9069455.35..9342037.71 rows=54516472 width=24)
-> Sort (cost=9069455.35..9205746.53 rows=54516472 width=24)
Sort Key: public.f_foo.fk_d1, public.f_foo.fk_d2, public.f_foo
-> Seq Scan on f_foo (cost=0.00..946021.72 rows=54516472 width=24)
The INSERT is indeed rather large (which is why we're issuing an EXPLAIN ahead of it to log the plan). So it's normal that the subsequent actual execution of INSERT takes that long. But I want to understand why the EXPLAIN would.
The table looks like this:
\d f_foo
Table "public.f_foo"
Column | Type | Modifiers
--------+---------+-----------
fk_d1 | integer | not null
fk_d2 | integer | not null
fk_d3 | integer | not null
fk_d4 | integer | not null
fk_d5 | integer | not null
fk_d6 | integer | not null
value | integer | not null
Indexes:
"f_foo_pkey" PRIMARY KEY, btree (fk_d1, fk_d5, fk_d3, fk_d2, fk_d4, fk_d6) CLUSTER
"ix_f_177_d4" btree (fk_d4)
"ix_f_177_d2" btree (fk_d2)
"ix_f_177_d3" btree (fk_d3)
Foreign-key constraints:
"f_177_d5_fkey" FOREIGN KEY (fk_d5) REFERENCES f_foo_d5(id) DEFERRABLE
"f_177_d1_fkey" FOREIGN KEY (fk_d1) REFERENCES f_foo_d1(id) DEFERRABLE
"f_177_d4_fkey" FOREIGN KEY (fk_d4) REFERENCES f_foo_d4(id) DEFERRABLE
"f_177_d3_fkey" FOREIGN KEY (fk_d3) REFERENCES f_foo_d3(id) DEFERRABLE
"f_177_d2_fkey" FOREIGN KEY (fk_d2) REFERENCES f_foo_d2(id) DEFERRABLE
Conceivably relevant (though I don't know how): this database has a very large number of table objects (2.2 million rows in pg_class). But other EXPLAINs are not taking anywhere near this long in this DB, it's only seen on INSERT into this and a couple of other tens-of-millions-of-rows tables.
Any pointers?
Thanks, best regards,
- Gulli