Limit & offset effect on query plans

Discussion:

(too old to reply)

Amitabh Kant

2012-12-13 04:08:09 UTC

Hi

Our scripts automatically add "LIMIT ALL" & "OFFSET 0" to every select
query if no values are passed on for these parameters. I remember reading
through the mailing list that it's better not to pass them if they are not
needed as they add a cost to the query plan. Is this the case, or am i
looking at a very minor optimization.

Amitabh

P.S. I haven't checked my query plans to see if there are any actual effect
of these keywords as I am still working my way through reading the output
of "Explain" ouput.

Pavan Deolasee

2012-12-13 04:43:26 UTC

Permalink

I would tend to think that is the latter. While undoubtedly
limit/offset clause will add another node during query planning and
execution, AFAICS the OFFSET 0 and LIMIT ALL cases are optimized to a
good extent. So the overhead of having them will not be significant.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Kevin Grittner

2012-12-13 15:47:14 UTC

Permalink

Post by Pavan Deolasee

Post by Amitabh Kant
Our scripts automatically add "LIMIT ALL" & "OFFSET 0" to every
select query if no values are passed on for these parameters. I
remember reading through the mailing list that it's better not
to pass them if they are not needed as they add a cost to the
query plan. Is this the case, or am i looking at a very minor
optimization.

I would tend to think that is the latter. While undoubtedly
limit/offset clause will add another node during query planning
and execution, AFAICS the OFFSET 0 and LIMIT ALL cases are
optimized to a good extent. So the overhead of having them will
not be significant.

I ran some quick tests on my i7 under Linux. Plan time was
increased by about 40 microseconds (based on EXPLAIN runtime) and
added a limit node to the plan. Execution time on a SELECT * FROM
tenk1 in the regression database went up by 1.35 ms on fully cached
runs.

-Kevin

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Tom Lane

2012-12-13 16:20:48 UTC

Permalink

Post by Kevin Grittner

Post by Pavan Deolasee
I would tend to think that is the latter. While undoubtedly
limit/offset clause will add another node during query planning
and execution, AFAICS the OFFSET 0 and LIMIT ALL cases are
optimized to a good extent. So the overhead of having them will
not be significant.

1.35ms out of what?

FWIW, I've been considering teaching the planner to not bother with
an actual Limit plan node if the limit clause is an obvious no-op.
I wasn't thinking about applications that blindly insert such clauses,
but rather about not penalizing subqueries when someone uses one of
these as an optimization fence. (The clauses would still work as an
opt fence, you'd just not see any Limit node in the final plan.)

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Kevin Grittner

2012-12-13 17:07:00 UTC

Permalink

Post by Tom Lane

Post by Kevin Grittner
I ran some quick tests on my i7 under Linux. Plan time was
increased by about 40 microseconds (based on EXPLAIN runtime)
and added a limit node to the plan. Execution time on a SELECT *
FROM tenk1 in the regression database went up by 1.35 ms on
fully cached runs.

1.35ms out of what?

Without the limit node the runtimes (after "priming" the cache)
were:

1.805, 2.533
1.805, 2.495
1.800, 2.446
1.818, 2.470
1.804, 2.502

The first time for each run is "Total runtime" reported by EXPLAIN,
the second is what psql reported from having \timing on.

With the limit node:

3.237, 3.914
3.243, 3.918
3.263, 4.010
3.265, 3.943
3.272, 3.953

I eyeballed that in the console window and said 1.35 based on rough
in-my-head calculations, although with it laid out in a nicer
format, I think I was a little low.

-Kevin

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Tom Lane

2012-12-13 17:12:50 UTC

Permalink

Post by Kevin Grittner

Post by Tom Lane
1.35ms out of what?

Without the limit node the runtimes (after "priming" the cache)
1.805, 2.533
1.805, 2.495
1.800, 2.446
1.818, 2.470
1.804, 2.502
The first time for each run is "Total runtime" reported by EXPLAIN,
the second is what psql reported from having \timing on.
3.237, 3.914
3.243, 3.918
3.263, 4.010
3.265, 3.943
3.272, 3.953
I eyeballed that in the console window and said 1.35 based on rough
in-my-head calculations, although with it laid out in a nicer
format, I think I was a little low.

Huh, so on a percentage basis the Limit-node overhead is actually pretty
significant, at least for a trivial seqscan plan like this case.
(This is probably about the worst-case scenario, really, since it's
tough to beat a simple seqscan for cost-per-emitted-row. Also I gather
you're not actually transmitting any data to the client ...)

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Kevin Grittner

2012-12-13 18:14:32 UTC

Permalink

Post by Tom Lane
Huh, so on a percentage basis the Limit-node overhead is actually
pretty significant, at least for a trivial seqscan plan like this
case. (This is probably about the worst-case scenario, really,
since it's tough to beat a simple seqscan for cost-per-emitted-
row. Also I gather you're not actually transmitting any data to
the client ...)

Right, I was trying to isolate the cost, and in a more complex
query, or with results streaming back, that could easily be lost in
the noise. Assuming that the setup time for the node is trivial
compared to filtering 10,000 rows, the time per row which passes
through the limit node seems to be (very roughly) 140 nanoseconds
on an i7. I don't know whether that will vary based on the number
or types of columns.

I just tried with returning the results rather than running EXPLAIN
ANALYZE, and any difference was lost in the noise with only five
samples each way. I wonder how much of the difference with EXPLAIN
ANALYZE might have been from the additional time checking. Maybe on
a normal run the difference would be less significant.

-Kevin

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance