Discussion:
Index usage for tstzrange?
(too old to reply)
Josh Berkus
2013-03-21 00:12:56 UTC
Permalink
Folks,

I just noticed that if I use a tstzrange for convenience, a standard
btree index on a timestamp won't get used for it. Example:

table a (
id int,
val text,
ts timestamptz
);
index a_ts on a(ts);

SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00')

... will NOT use the index a_ts. Is this something which could be fixed
for 9.4?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Tom Lane
2013-03-21 03:58:50 UTC
Permalink
Post by Josh Berkus
I just noticed that if I use a tstzrange for convenience, a standard
table a (
id int,
val text,
ts timestamptz
);
index a_ts on a(ts);
... will NOT use the index a_ts.
Well, no. <@ is not a btree-indexable operator.

What I find more disturbing is that this is what I get from the example
in HEAD:

regression=# explain SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR: XX000: type 1184 is not a range type
LOCATION: range_get_typcache, rangetypes.c:1451

Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage? And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Vasilis Ventirozos
2013-03-21 04:07:25 UTC
Permalink
Post by Tom Lane
Post by Josh Berkus
I just noticed that if I use a tstzrange for convenience, a standard
table a (
id int,
val text,
ts timestamptz
);
index a_ts on a(ts);
... will NOT use the index a_ts.
What I find more disturbing is that this is what I get from the example
tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR: XX000: type 1184 is not a range type
LOCATION: range_get_typcache, rangetypes.c:1451
Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage? And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.
regards, tom lane
--
http://www.postgresql.org/mailpref/pgsql-performance
It is a legit usage, this is from a test i did myself (9.2.3)

test=# explain SELECT * FROM a WHERE ts <@
tstzrange('2013-01-01','2013-04-01 00:10:00');
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..23.75 rows=1 width=44)
Filter: (ts <@ '["2013-01-01 00:00:00+02","2013-04-01
00:10:00+03")'::tstzrange)
Heikki Linnakangas
2013-03-21 08:52:42 UTC
Permalink
Post by Vasilis Ventirozos
Post by Tom Lane
What I find more disturbing is that this is what I get from the example
tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR: XX000: type 1184 is not a range type
LOCATION: range_get_typcache, rangetypes.c:1451
Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage? And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.
It is a legit usage, this is from a test i did myself (9.2.3)
tstzrange('2013-01-01','2013-04-01 00:10:00');
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..23.75 rows=1 width=44)
00:10:00+03")'::tstzrange)
Looks like the range type cost estimation patch broke this, back in
August already. The case of var <@ constant, where constant is a range
and var is an element, that's broken. The cost estimation function,
rangesel(), incorrectly assumes that the 'var' is always a range type.

It's a bit worrying that no-one noticed until now. I'll add a test for
that operator to the rangetypes regression test.

The immediate fix is attached, but this made me realize that rangesel()
is still missing estimation for the "element <@ range" operator. It
shouldn't be hard to implement, I'm pretty sure we have all the
statistics we need for that.

- Heikki
Heikki Linnakangas
2013-03-22 21:53:29 UTC
Permalink
On Thu, Mar 21, 2013 at 12:52 PM, Heikki Linnakangas<
The immediate fix is attached, but this made me realize that rangesel() is
be hard to implement, I'm pretty sure we have all the statistics we need
for that.
Probably we could even call existing scalarltsel and scalargtsel for this
case.
I came up with the attached. I didn't quite use scalarltsel, but I used
the scalarineqsel function, which contains the "guts" of scalarltsel and
scalargtsel.
/*
* We use the data type's default < operator. This is bogus, if the range
* type's rngsubopc operator class is different. In practice, that ought
* to be rare. It would also be bogus to use the < operator from the
* rngsubopc operator class, because the statistics are collected using
* using the default operator class, anyway.
*
* For the same reason, use the default collation. The statistics are
* collected with the default collation.
*/
Does that make sense? The other option would be to use the < operator
from the rngsubopc op class, even though the scalar statistics are
collected with the default b-tree < operator. As long as the two sort
roughly the same way, you get reasonable results either way. Yet another
option would be to use histogram_selectivity() instead of
ineq_histogram_selectivity(), if the range's rngsubopc opclass isn't the
type's default opclass. histogram_selectivity() works with any operator
regardless of the sort ordering, basically using the histogram values
merely as a sample, rather than as a histogram. But I'm reluctant to
make this any more complicated, as using a non-default opclass for the
range type is rare.

- Heikki
Josh Berkus
2013-03-22 00:05:31 UTC
Permalink
Yes, but it's equivalent to ( ( a >= b1 or b1 is null ) and ( a < b2 or
b2 is null ) ), which *is* btree-indexable and can use an index. So it
seems like the kind of optimization we could eventually make.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Heikki Linnakangas
2013-03-22 22:53:14 UTC
Permalink
Yes, but it's equivalent to ( ( a>= b1 or b1 is null ) and ( a< b2 or
b2 is null ) ), which *is* btree-indexable and can use an index. So it
seems like the kind of optimization we could eventually make.
Yeah. The sort order of <@ is the same as regular b-tree, so it should
be possible. In fact, nothing stops you from creating the suitable
operator and b-tree support functions. See attached patch for int4, but
the same should work for timestamptz.

We should do this automatically. Or am I missing something?

- Heikki
Josh Berkus
2013-03-23 01:23:43 UTC
Permalink
Post by Heikki Linnakangas
We should do this automatically. Or am I missing something?
Aside from the need to support @> as well, not that I can see.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Tom Lane
2013-03-23 04:31:48 UTC
Permalink
Post by Heikki Linnakangas
We should do this automatically. Or am I missing something?
Yes. This is not equality.
Post by Heikki Linnakangas
ALTER OPERATOR FAMILY integer_ops USING btree ADD
FUNCTION 1 btint4rangecmp(int4, int4range);
That will break approximately everything in sight, starting with the
planner's opinion of what equality is. There is *way* too much stuff
that knows the semantics of btree opclasses for us to start jamming
random operators into them, even if this seemed to work in trivial
testing. (See the last section of src/backend/access/nbtree/README
to just scratch the surface of the assumptions this breaks.)

It's possible that for constant ranges we could have the planner expand
"intcol <@ 'x,y'::int4range" into "intcol between x and y", using
something similar to the index LIKE optimization (ie, the "special
operator" stuff in indxpath.c). I'd like to find a way to make that
type of optimization pluggable, though --- the existing approach of
hard-wiring knowledge into indxpath.c has never been anything but
a kluge, and it definitely doesn't scale as-is to anything except
built-in types and operators.

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...