Jeff Adams - NOAA Affiliate
2013-03-11 14:27:17 UTC
Greetings,
I have a large table (~90 million rows) containing vessel positions. In
addition to a column that contains the location information (the_geom), the
table also contains two columns that are used to uniquely identify the
vessel (mmsi and name) and a column containing the Unix time (epoch) at
which the position information was logged. I frequently need to assign
records to vessel transits. To do this, I currently create a CTE that uses
a Window function (partitioning the data by mmsi and name ordered by epoch)
to examine the time that has elapsed between successive position reports
for individual vessels. For every position record for a vessel (as
identified using mmsi and name), if the time elapsed between the current
position record and the previous record (using the lag function) is less
than or equal to 2 hours, I assign the record a value of 0 to a CTE column
named tr_index. If the time elapsed is greater than 2 hours, I assign the
record a value of 1 to the tr_index column. I then use the CTE to generate
transit numbers by summing the values in the tr_index field across a Window
that also partitions the data by mmsi and name and is ordered by epoch.
This works, but is very slow (hours). The table is indexed (multi-column
index on mmsi, name and index on epoch). Does anyone see a way to get what
I am after in a more efficient manner. What I am after is an assignment of
transit number to vessels' position records based on whether the records
were within two hours of each other. The SQL that I used is provided below.
Any advice would be greatly appreciated...
WITH
cte_01 AS
(
SELECT
a.id,
a.mmsi,
a.name,
a.epoch,
a.the_geom
CASE
WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
ELSE 0
END AS tr_index
FROM table a
WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
)
SELECT
a.id,
a.mmsi,
a.name,
a.epoch,
a.the_geom,
1 + sum(a.tr_index) OVER w AS transit,
a.active
FROM cte_01 a
WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
I have a large table (~90 million rows) containing vessel positions. In
addition to a column that contains the location information (the_geom), the
table also contains two columns that are used to uniquely identify the
vessel (mmsi and name) and a column containing the Unix time (epoch) at
which the position information was logged. I frequently need to assign
records to vessel transits. To do this, I currently create a CTE that uses
a Window function (partitioning the data by mmsi and name ordered by epoch)
to examine the time that has elapsed between successive position reports
for individual vessels. For every position record for a vessel (as
identified using mmsi and name), if the time elapsed between the current
position record and the previous record (using the lag function) is less
than or equal to 2 hours, I assign the record a value of 0 to a CTE column
named tr_index. If the time elapsed is greater than 2 hours, I assign the
record a value of 1 to the tr_index column. I then use the CTE to generate
transit numbers by summing the values in the tr_index field across a Window
that also partitions the data by mmsi and name and is ordered by epoch.
This works, but is very slow (hours). The table is indexed (multi-column
index on mmsi, name and index on epoch). Does anyone see a way to get what
I am after in a more efficient manner. What I am after is an assignment of
transit number to vessels' position records based on whether the records
were within two hours of each other. The SQL that I used is provided below.
Any advice would be greatly appreciated...
WITH
cte_01 AS
(
SELECT
a.id,
a.mmsi,
a.name,
a.epoch,
a.the_geom
CASE
WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
ELSE 0
END AS tr_index
FROM table a
WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
)
SELECT
a.id,
a.mmsi,
a.name,
a.epoch,
a.the_geom,
1 + sum(a.tr_index) OVER w AS transit,
a.active
FROM cte_01 a
WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
--
Jeff
Jeff