Slow concurrent processing

Discussion:

(too old to reply)

Misa Simic

2013-03-12 03:55:26 UTC

Hi all,

We have one table with list of "records for processing"...

We loop trough that table and call one long runing function:

do_the_math_for_record(record_id)

which use different tables for select related rows for input record_id, do
some calculations and insert results in two tables...

and we have made 1 function process_all_records()

what simply does: SELECT do_the_math_for_record(record_id) FROM
records_for_processing

When we run that function - it last about 4 minutes...

There are about 300 rows in records_for_processing... we have logged the
time on the beginning of do_the_math, and the time in end of do the math...
and noticed that processing each row, last between 0.5 to 2 seconds...

so our do_the_math looks like:

PERFORM log_time(record_id, clock_timestamp(), 1)

PERFORM do_the_math_and_save_results(record_id);

PERFORM log_time(record_id, clock_timestamp(), 2)

Then we thought, if we take all "records for processing" and process each
in separate connection - it should last longer...

but - got worse result! (using 30 concurrent connections...)... about 7
mins...

if we reduce concurrent connections on 10 - we got result in approx the
same time as sequential processing...

but - if replace do_the_math_and_save_results with pg_sleep(1); To simulate
long running function so processing each row - last 1 sec...

Sequential processing last as expected 300 seconds!

Concurrent processing last faster with higher number of
concurrent connections - about 30 seconds with 30 connections! (much faster
- and expected...)

however, if we return our: do_the_math_and_save_results - we can't get
better results in concurrent processing...

with higher number of conccurent connections - result is worse... also we
have noticed that for some records difference between end_time and
start_time si even longer than 1 min - but it is random - not always on the
same id... i.e. in this concurrent run lasts 1 min - in next 1 sec - but
some other takes about 1 min...

Any idea - why? :)

It says to me - that there is somewhere lock on some tables - so probably
our concurrent connections wait - to other finish... but I cant figure out:
what and why...

do_the_math_and_save results - selects data from 10 other tables,
calculates something, and results inserts in other tables...

there are about 3 tracking tables with (record_id - other data...... and
about 7 settings tables what we join to tracking tables to get all
info...), then do the math with that info - and insert results..

we don't do any update... (to have possibility two connections want to
update the same row in the same table)

data from tracking_tables - should be separate sets of data for two
differenet record_ids...

(joined rows from settings tables could be common - for two sets of
different record_id)

but - even they are the same set - SELECTs should not lock the rows in
tables...

There are places where we do:

INSERT INTO result_table (columns)
SELECT query (tracking and settings tables joined)

Is there a chance it does some lock somewhere?

can above query be run "concurrently"?

Many thanks,

Misa

Misa Simic

2013-03-12 14:13:06 UTC

Permalink

Hi,

Researching deeply my problem with concurrent processing i have found:

http://postgresql.1045698.n5.nabble.com/WHY-transaction-waits-for-another-transaction-tp2142627p2142630.html

"The more likely suspect is a foreign key conflict.
Are both transactions inserting/updating rows that could reference
the same row(s) in a master table?" - Tom Lane

This is exactly the case (in my case) - several connections tries to insert
rows in the same table... but some columns are referenced to settings
tables... and there is possibility that two rows what we want to
insert reference the same row in settings table...

Any idea how to make this process faster?

maybe to make new tables the same structure as results tables.... with no
indexes fk etc... during processing insert into un-referenced tables -
when full process finish - move rows from unreferenced to real results
tables?

Thanks,

Misa

Post by Misa Simic
Hi all,
We have one table with list of "records for processing"...
do_the_math_for_record(record_id)
which use different tables for select related rows for input record_id, do
some calculations and insert results in two tables...
and we have made 1 function process_all_records()
what simply does: SELECT do_the_math_for_record(record_id) FROM
records_for_processing
When we run that function - it last about 4 minutes...
There are about 300 rows in records_for_processing... we have logged the
time on the beginning of do_the_math, and the time in end of do the math...
and noticed that processing each row, last between 0.5 to 2 seconds...
PERFORM log_time(record_id, clock_timestamp(), 1)
PERFORM do_the_math_and_save_results(record_id);
PERFORM log_time(record_id, clock_timestamp(), 2)
Then we thought, if we take all "records for processing" and process each
in separate connection - it should last longer...
but - got worse result! (using 30 concurrent connections...)... about 7
mins...
if we reduce concurrent connections on 10 - we got result in approx the
same time as sequential processing...
but - if replace do_the_math_and_save_results with pg_sleep(1); To
simulate long running function so processing each row - last 1 sec...
Sequential processing last as expected 300 seconds!
Concurrent processing last faster with higher number of
concurrent connections - about 30 seconds with 30 connections! (much faster
- and expected...)
however, if we return our: do_the_math_and_save_results - we can't get
better results in concurrent processing...
with higher number of conccurent connections - result is worse... also we
have noticed that for some records difference between end_time and
start_time si even longer than 1 min - but it is random - not always on the
same id... i.e. in this concurrent run lasts 1 min - in next 1 sec - but
some other takes about 1 min...
Any idea - why? :)
It says to me - that there is somewhere lock on some tables - so probably
what and why...
do_the_math_and_save results - selects data from 10 other tables,
calculates something, and results inserts in other tables...
there are about 3 tracking tables with (record_id - other data...... and
about 7 settings tables what we join to tracking tables to get all
info...), then do the math with that info - and insert results..
we don't do any update... (to have possibility two connections want to
update the same row in the same table)
data from tracking_tables - should be separate sets of data for two
differenet record_ids...
(joined rows from settings tables could be common - for two sets of
different record_id)
but - even they are the same set - SELECTs should not lock the rows in
tables...
INSERT INTO result_table (columns)
SELECT query (tracking and settings tables joined)
Is there a chance it does some lock somewhere?
can above query be run "concurrently"?
Many thanks,
Misa

Jeff Janes

2013-03-12 17:09:21 UTC

Permalink

Post by Misa Simic
Hi,
http://postgresql.1045698.n5.nabble.com/WHY-transaction-waits-for-another-transaction-tp2142627p2142630.html
"The more likely suspect is a foreign key conflict.
Are both transactions inserting/updating rows that could reference
the same row(s) in a master table?" - Tom Lane
This is exactly the case (in my case) - several connections tries to
insert rows in the same table... but some columns are referenced to
settings tables... and there is possibility that two rows what we want to
insert reference the same row in settings table...

Unless you are running an ancient version of PostgreSQL (<8.1), this would
no longer pose a problem.

Cheers,

Jeff

Misa Simic

2013-03-12 17:16:16 UTC

Permalink

:(

Ah - 9.1.0 is postgres version on Ubuntu...

Thanks Jeff - you saved me some time - reorganising functions to work with
different tables would take time... what potentially will not give us
solution :(

Many thanks,

Misa

Post by Jeff Janes

Unless you are running an ancient version of PostgreSQL (<8.1), this would
no longer pose a problem.
Cheers,
Jeff

Steve Crawford

2013-03-12 14:20:18 UTC

Permalink

Post by Misa Simic
Hi all,
We have one table with list of "records for processing"...
do_the_math_for_record(record_id)...<snip>...
but - if replace do_the_math_and_save_results with pg_sleep(1); To
simulate long running function so processing each row - last 1 sec...
Sequential processing last as expected 300 seconds!
Concurrent processing last faster with higher number of
concurrent connections - about 30 seconds with 30 connections! (much
faster - and expected...)
however, if we return our: do_the_math_and_save_results - we can't get
better results in concurrent processing...

Sleep will not have any significant impact on CPU, memory or disk use
and thus is not a simulation of actual processing.

All you have really shown us so far is a black box. Please provide an
overview of your schemas and the type of processing you are attempting
on them.

Cheers,
Steve

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Misa Simic

2013-03-12 15:06:46 UTC

Permalink

Thanks Steve

Well, the full story is too complex - but point was - whatever blackbox
does - it last 0.5 to 2secs per 1 processed record (maybe I was wrong but I
thought the reason why it takes the time how much it needs to actually do
the task -CPU/IO/memory whatever is not that important....) - so I really
don't see difference between: call web service, insert row in the table
(takes 3 secs) and sleep 3 seconds - insert result in the table...

if we do above task for two things sequential - it will last 6 secs...but
if we do it "concurentelly" - it should last 3 secs... (in theory :) )

I was guessed somewhere is lock - but wasn't clear where/why when there are
no updates - just inserts...

But I haven't know that during INSERT is done row lock on refferenced
tables as well - from FK columns...

So I guess now it is cause of the problem...

We will see how it goes with insert into unlogged tables with no FK...

Many thanks,

Misa

Post by Misa Simic
Hi all,
We have one table with list of "records for processing"...
do_the_math_for_record(record_**id)...<snip>...
but - if replace do_the_math_and_save_results with pg_sleep(1); To
simulate long running function so processing each row - last 1 sec...
Sequential processing last as expected 300 seconds!
Concurrent processing last faster with higher number of concurrent
connections - about 30 seconds with 30 connections! (much faster - and
expected...)
however, if we return our: do_the_math_and_save_results - we can't get
better results in concurrent processing...

Sleep will not have any significant impact on CPU, memory or disk use and
thus is not a simulation of actual processing.
All you have really shown us so far is a black box. Please provide an
overview of your schemas and the type of processing you are attempting on
them.
Cheers,
Steve

Steve Crawford

2013-03-12 15:54:57 UTC

Permalink

Post by Misa Simic
Thanks Steve
Well, the full story is too complex - but point was - whatever
blackbox does - it last 0.5 to 2secs per 1 processed record (maybe I
was wrong but I thought the reason why it takes the time how much it
needs to actually do the task -CPU/IO/memory whatever is not that
important....) - so I really don't see difference between: call web
service, insert row in the table (takes 3 secs) and sleep 3 seconds -
insert result in the table...
if we do above task for two things sequential - it will last 6
secs...but if we do it "concurentelly" - it should last 3 secs... (in
theory :) )

Not at all - even in "theory." Sleep involves little, if any, contention
for resources. Real processing does. So if a process requires 100% of
available CPU then one process gets it all while many running
simultaneously will have to share the available CPU resource and thus
each will take longer to complete. Or, if you prefer, think of a file
download. If it takes an hour to download a 1GB file it doesn't mean
that you can download two 1GB files concurrently in one hour even if
"simulating" the process by a sleep(3600) suggests it is possible.

I should note, however, that depending on the resource that is limiting
your speed there is often room for optimization through simultaneous
processing - especially when processes are CPU bound. Since PostgreSQL
associates each back-end with one CPU *core*, you can have a situation
where one core is spinning and the others are more-or-less idle. In
those cases you may see an improvement by increasing the number of
simultaneous processes to somewhere shy of the number of cores.

Post by Misa Simic
I was guessed somewhere is lock - but wasn't clear where/why when
there are no updates - just inserts...
But I haven't know that during INSERT is done row lock on refferenced
tables as well - from FK columns...
So I guess now it is cause of the problem...
We will see how it goes with insert into unlogged tables with no FK...

It will almost certainly go faster as you have eliminated integrity and
data-safety. This may be acceptable to you (non-real-time crunching of
data that can be reloaded from external sources or temporary processing
that is ultimately written back to durable storage) but it doesn't mean
you have identified the actual cause.

One thing you didn't state. Is all this processing taking place in
PostgreSQL? (i.e. update foo set bar = do_the_math(baz, zap, boom))
where do_the_math is a PL/pgSQL, PL/Python, ... or are external
processes involved?

Cheers,
Steve

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Misa Simic

2013-03-12 17:11:46 UTC

Permalink

Thanks Steve,

Of course I thought under the limits... I haven't thought there are that
kind of problems(CPU/Memory/io) because of there are no degradation during
long running process - on other sides... i.e. some complex query - run
when long running process is off and run it when long runing process is
under the go - takes similar time etc... (and that query uses as well
tables involved in long runing do_the_math_ function - but of course dont
ask at all for potential rows what will long runining functin produce - but
would not get it anyway even asks...)

all the processing is under postgres... (but no updates at all - that would
point me directly to potential row_lock problem...)

To process one record - is again deeper sequential processing thing with
lot if/else etc...

Something like:

GetMasterInfo about RecordID (join several settings table related to input
RecordID)

If that RecordID is that type then
apply_callculation1(recordID)
else
apply_calculation2(recordID)
and so on...

then for exapmple apply_calculation1 says:

get all records for this recordID between related period... (From tracking
tables)
for each day... take status from that day... calculate hours what match
different time periods during the day, and use different rate for each -
but again that rate - in some cases depends on Total hours spent in the
week that day belongs for that record_id etc etc...
so basicaly insert in result_table1 - splited amounts by category for each
day applying different calculations for each category...
Then later sum things from result_table1 - insert_them in result_table2...
and do again further calculations based on info in resut_table2 and insert
results in the same table...

All that math for 1 thing - last 0.5 to 2secs - depending on lot of things
etc,,,

sleep(1) - was just simplified thing to spent required time for
processing... not to help about hardware limits and bandwith :)

just the fact we can run complex query during long processing function is
under run - said me there are no hardware resource problems...

Many thanks,

Misa

Post by Steve Crawford

Post by Misa Simic
Thanks Steve
Well, the full story is too complex - but point was - whatever blackbox
does - it last 0.5 to 2secs per 1 processed record (maybe I was wrong but I
thought the reason why it takes the time how much it needs to actually do
the task -CPU/IO/memory whatever is not that important....) - so I really
don't see difference between: call web service, insert row in the table
(takes 3 secs) and sleep 3 seconds - insert result in the table...
if we do above task for two things sequential - it will last 6 secs...but
if we do it "concurentelly" - it should last 3 secs... (in theory :) )

Not at all - even in "theory." Sleep involves little, if any, contention
for resources. Real processing does. So if a process requires 100% of
available CPU then one process gets it all while many running
simultaneously will have to share the available CPU resource and thus each
will take longer to complete. Or, if you prefer, think of a file download.
If it takes an hour to download a 1GB file it doesn't mean that you can
download two 1GB files concurrently in one hour even if "simulating" the
process by a sleep(3600) suggests it is possible.
I should note, however, that depending on the resource that is limiting
your speed there is often room for optimization through simultaneous
processing - especially when processes are CPU bound. Since PostgreSQL
associates each back-end with one CPU *core*, you can have a situation
where one core is spinning and the others are more-or-less idle. In those
cases you may see an improvement by increasing the number of simultaneous
processes to somewhere shy of the number of cores.

Post by Misa Simic
I was guessed somewhere is lock - but wasn't clear where/why when there
are no updates - just inserts...
But I haven't know that during INSERT is done row lock on refferenced
tables as well - from FK columns...
So I guess now it is cause of the problem...
We will see how it goes with insert into unlogged tables with no FK...

It will almost certainly go faster as you have eliminated integrity and
data-safety. This may be acceptable to you (non-real-time crunching of data
that can be reloaded from external sources or temporary processing that is
ultimately written back to durable storage) but it doesn't mean you have
identified the actual cause.
One thing you didn't state. Is all this processing taking place in
PostgreSQL? (i.e. update foo set bar = do_the_math(baz, zap, boom)) where
do_the_math is a PL/pgSQL, PL/Python, ... or are external processes
involved?
Cheers,
Steve