Quantcast
Channel: How to optimize Merge Anti Join? - Database Administrators Stack Exchange
Viewing all articles
Browse latest Browse all 2

How to optimize Merge Anti Join?

$
0
0

While trying to find the fastest way to do a bulk update of a large table, I came up with the following plan:

having main_table (potentially above 1000 mln rows) and update table up to a few million rows. Data in update are indexed according to main_table (have corresponding id columns).

  1. copy rows from main_table into temp_table where id not in updateid table (where data has actually changed).
  2. simple append of the update table onto temp_table
  3. drop main_table
  4. rename temp_table to main_table
  5. renumerate main_tablepk
  6. create the next update with python and repeat.

The above is to be done with the following sequence of queries:

CREATE TABLE temp_table (like main_table including all);INSERT INTO  temp_table (listing_id,date,available,price,timestamp)    (SELECT listing_id,date,available,price,timestamp         FROM   main_table c         WHERE  NOT EXISTS (SELECT                            FROM   update                            WHERE  id = c.id));INSERT INTO temp_table (listing_id,date,available,price,parsing_timestamp)    (SELECT listing_id,date,available,price,timestamp     FROM   update c);DROP  TABLE main_table cascade;ALTER TABLE temp_table RENAME TO main_table;ALTER TABLE main_table DROP COLUMN idALTER TABLE main_table ADD COLUMN id SERIAL PRIMARY KEY;DROP  TABLE update cascade;

While debugging the queries, I have found the slow queries with explain:

EXPLAIN (ANALYZE, BUFFERS) CREATE TABLE temp_table (like main_table including all);INSERT INTO  temp_table (listing_id,date,available,price,timestamp)    (SELECT listing_id,date,available,price,timestamp         FROM   main_table c         WHERE  NOT EXISTS (SELECT                            FROM   update                            WHERE  id = c.id));

This is the result of EXPLAIN (ANALYZE, BUFFERS) as requested in comments:

QUERY PLANMerge Anti Join (cost=513077.42..4789463.48 rows=109800833 width=40) (actual time=1216.018..48520.564 rows=112757269 loops=1)Merge Cond: (c.id = update.id)Buffers: shared hit=4 read=1359891 written=2838, temp read=8701 written=15077-> Index Scan using cals_pkey on cals c (cost=0.57..3958576.01 rows=113119496 width=44) (actual time=0.857..29573.798 rows=113119497 loops=1)Buffers: shared hit=1 read=1330192 written=2838 -> Sort (cost=513076.85..521373.51 rows=3318663 width=8) (actual time=1215.147..1260.191 rows=362229 loops=1)Sort Key: update.idSort Method: external merge Disk: 58480kBBuffers: shared hit=3 read=29699, temp read=8701 written=15077-> Seq Scan on update (cost=0.00..62881.63 rows=3318663 width=8) (actual time=0.179..423.100 rows=3318663 loops=1)Buffers: shared read=29695Planning Time: 0.259 msExecution Time: 52757.349 ms

How to optimize the queries?


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images