Discussion:
Planner is getting wrong row count
(too old to reply)
AI Rumman
2013-04-02 18:57:45 UTC
Permalink
Why is the following query getting wrong estimation of rows?
I am using Postgresql 9.2.1 with default_statistics_target = 100.
I execute vacuum analyze each night.

explain analyze
SELECT
entity.id AS "Leads_id", entity.type AS "Leads_type" ,
leads.firstname AS "Leads_firstname",
leads.lastname AS "Leads_lastname"
FROM leads
INNER JOIN entity ON leads.leadid=entity.id
LEFT JOIN groups ON groups.groupid = entity.smownerid
LEFT join users ON entity.smownerid= users.id
WHERE entity.type='Leads' AND entity.deleted=0 AND leads.converted=0
Hash Join (cost=14067.90..28066.53 rows=90379 width=26) (actual
time=536.009..1772.910 rows=337139 loops=1)
Hash Cond: (leads.leadid = entity.id)
-> Seq Scan on leads (cost=0.00..7764.83 rows=533002 width=18) (actual
time=0.008..429.576 rows=532960 loops=1)
Filter: (converted = 0)
-> Hash (cost=9406.25..9406.25 rows=372932 width=16) (actual
time=535.800..535.800 rows=342369 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 16049kB
-> Index Scan using entity_type_idx on entity
(cost=0.00..9406.25 rows=372932 width=16) (actual time=0.030..305.250
rows=342369 loops=1)
Index Cond: ((type)::text = 'Leads'::text)


\d leads
Table "public.leads"
Column | Type | Modifiers

------------------+------------------------+---------------------------------------
leadid | integer | not null
email | character varying(100) |
interest | character varying(50) |
firstname | character varying(100) |
salutation | character varying(200) |
lastname | character varying(100) | not null
company | character varying(200) | not null
annualrevenue | integer | default 0
industry | character varying(200) |
campaign | character varying(30) |
rating | character varying(200) |
leadstatus | character varying(50) |
leadsource | character varying(200) |
converted | integer | default 0
designation | character varying(200) | default 'SalesMan'::character
varying
licencekeystatus | character varying(50) |
space | character varying(250) |
comments | text |
priority | character varying(50) |
demorequest | character varying(50) |
partnercontact | character varying(50) |
productversion | character varying(20) |
product | character varying(50) |
maildate | date |
nextstepdate | date |
fundingsituation | character varying(50) |
purpose | character varying(50) |
evaluationstatus | character varying(50) |
transferdate | date |
revenuetype | character varying(50) |
noofemployees | integer |
yahooid | character varying(100) |
assignleadchk | integer | default 0
department | character varying(200) |
emailoptout | character varying(3) | default 0
siccode | character varying(50) |
Indexes:
"leads_pkey" PRIMARY KEY, btree (leadid)
"ftx_en_leads_company" gin (to_tsvector('v_en'::regconfig,
for_fts(company::text)))
"ftx_en_leads_email" gin (to_tsvector('v_en'::regconfig,
for_fts(email::text)))
"ftx_en_leads_emailoptout" gin (to_tsvector('v_en'::regconfig,
for_fts(emailoptout::text)))
"ftx_en_leads_firstname" gin (to_tsvector('v_en'::regconfig,
for_fts(firstname::text)))
"ftx_en_leads_lastname" gin (to_tsvector('v_en'::regconfig,
for_fts(lastname::text)))
"ftx_en_leads_yahooid" gin (to_tsvector('v_en'::regconfig,
for_fts(yahooid::text)))
"leads_converted_idx" btree (converted)
"leads_leadsource_idx" btree (leadsource)
"leads_leadstatus_idx" btree (leadstatus)



\d entity
Table "public.entity"
Column | Type | Modifiers

--------------------+-----------------------------+------------------------------
id | integer | not null
smcreatorid | integer | not null default 0
smownerid | integer | not null default 0
modifiedby | integer | not null default 0
setype | character varying(30) | not null
description | text |
createdtime | timestamp without time zone | not null
modifiedtime | timestamp without time zone | not null
viewedtime | timestamp without time zone |
status | character varying(50) |
version | integer | not null default 0
presence | integer | default 1
deleted | integer | not null default 0
owner_type | character(1) | not null default
'U'::bpchar
last_activity_date | timestamp without time zone |
Indexes:
"entity_pkey" PRIMARY KEY, btree (id)
"entity_createdtime_idx" btree (createdtime)
"entity_modifiedby_idx" btree (modifiedby)
"entity_modifiedtime_idx" btree (modifiedtime)
"entity_setype_idx" btree (setype) WHERE deleted = 0
"entity_smcreatorid_idx" btree (smcreatorid)
"entity_smownerid_idx" btree (smownerid)
"ftx_en_entity_description" gin (to_tsvector('v_en'::regconfig,
for_fts(description)))
"entity_deleted_idx" btree (deleted)
Referenced by:
TABLE "service" CONSTRAINT "fk_1_service" FOREIGN KEY (serviceid)
REFERENCES entity(id) ON DELETE CASCADE
TABLE "servicecontracts" CONSTRAINT "fk_1_servicecontracts" FOREIGN KEY
(servicecontractsid) REFERENCES entity(id) ON DELETE CASCADE
TABLE "_emails" CONSTRAINT "fk__emails_id" FOREIGN KEY (id) REFERENCES
entity(id) ON DELETE CASCADE


Please advice.

Thanks.
Kevin Grittner
2013-04-03 15:34:33 UTC
Permalink
Post by AI Rumman
Why is the following query getting wrong estimation of rows?
I am using Postgresql 9.2.1 with default_statistics_target = 100.
I execute vacuum analyze each night.
Hash Join  (cost=14067.90..28066.53 rows=90379 width=26) (actual time=536.009..1772.910 rows=337139 loops=1)
    Hash Cond: (leads.leadid = entity.id)
    ->  Seq Scan on leads  (cost=0.00..7764.83 rows=533002 width=18) (actual time=0.008..429.576 rows=532960 loops=1)
          Filter: (converted = 0)
    ->  Hash  (cost=9406.25..9406.25 rows=372932 width=16) (actual time=535.800..535.800 rows=342369 loops=1)
          Buckets: 65536  Batches: 1  Memory Usage: 16049kB
          ->  Index Scan using entity_type_idx on entity  (cost=0.00..9406.25 rows=372932 width=16) (actual time=0.030..305.250 rows=342369 loops=1)
                Index Cond: ((type)::text = 'Leads'::text)
The estimates match up well with actual until the hash join.  That
suggests that there is a correlation between the join conditions
and the other selection criteria which the planner doesn't know
about.  Right now PostgreSQL has no way to adjust estimates based
on such correlations.  If an inefficient plan is being chosen due
to this, there are a few tricks to coerce the plan.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...