|
|
|||||||||
|
留言 |
收藏本站 | ||||||||
|
Oracle技术站。提供各种数据库技术文档、培训教程、解决方案、案例、工具及周边新闻。同时,本站还提供oracle培训、优化、备份方案和技术支持。email: fuyuncat@gmail.com MSN: fuyuncat@hotmail.com |
|||||||||
对Group By 语句的一次优化过程
作者:
fuyuncat
来源:
www.HelloDBA.com
生产环境中发现一条语句很慢,拿回来一看,其实是一个简单的Group
By语句:
表CCMMT的数据量比较大,5M多条记录。
1、
SQL> select CDE, CID
2 from CCMMT
3 GROUP BY CDE,
CID
4 having max(ADT)
< sysdate - 180;
707924 rows selected.
Elapsed: 00:06:17.49
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=414 Card=238583 Bytes=4771660)
1 0 FILTER
2 1 SORT (GROUP BY NOSORT)
(Cost=414 Card=238583 Bytes=4771660)
3 2 TABLE
ACCESS (BY INDEX ROWID) OF 'CCMMT' (Cost=414 Card=57969096 Bytes=1159381920)
4 3
INDEX (FULL SCAN) OF 'CCMMT_TEMP_IDX' (NON-UNIQUE) (Cost=26
Card=57969096)
Statistics
----------------------------------------------------------
0 recursive
calls
0 db
block gets
2769177 consistent gets
1089991 physical reads
0 redo
size
23926954 bytes
sent via SQL*Net to client
519785 bytes received via SQL*Net from
client
47196 SQL*Net
roundtrips to/from client
0 sorts
(memory)
0 sorts
(disk)
707924 rows processed
要6min多返回。尝试调整语句写法,用minus代替Group By:
2、
SQL> select DISTINCT CDE, CID
2 from CCMMT
3 where ADT <
sysdate - 180
4 minus
5 select
DISTINCT CDE, CID
6 from CCMMT
7 where ADT >=
sysdate - 180;
707924 rows selected.
Elapsed: 00:00:21.53
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=190624 Card=2794940
Bytes=111797600)
1 0 MINUS
2 1 SORT* (UNIQUE)
(Cost=95312 Card=2794940 Bytes=55898800) :Q13049001
3 2 INDEX*
(FAST FULL SCAN) OF 'CCMMT_UQ1' (UNIQUE) (Cost=77305 Card=2898455
Bytes=57969100)
:Q13049000
4 1 SORT* (UNIQUE)
(Cost=95312 Card=2794940 Bytes=55898800)
:Q13050001
5 4 INDEX*
(FAST FULL SCAN) OF 'CCMMT_UQ1' (UNIQUE) (Cost=77305 Card=2898455
Bytes=57969100)
:Q13050000
2 PARALLEL_TO_SERIAL
SELECT DISTINCT C0 C0,C1 C1 FROM :Q13049000
ORDER BY C0,C1
3 PARALLEL_TO_PARALLEL
SELECT /*+ INDEX_RRS(A1 "CCMMT_UQ1")*/ A1."CDE"
C0,A1."CA
4 PARALLEL_TO_SERIAL
SELECT DISTINCT C0 C0,C1 C1 FROM :Q13050000
ORDER BY C0,C1
5 PARALLEL_TO_PARALLEL
SELECT /*+ INDEX_RRS(A1 "CCMMT_UQ1")*/ A1."CDE"
C0,A1."CA
Statistics
----------------------------------------------------------
0
recursive calls
33 db
block gets
126566 consistent gets
129243 physical reads
0 redo
size
18461368 bytes
sent via SQL*Net to client
519785 bytes received via SQL*Net from
client
47196 SQL*Net
roundtrips to/from client
4 sorts
(memory)
2 sorts
(disk)
707924 rows processed
效果不错,Consistent gets 和 Physical Reads都下降了,同时只需要21s就返回了。但从查询计划看,用到了并行查询,因此会消耗更多的CPU。
在(ADT, CDE, CID )上创建索引,再次执行:
3、
SQL> select DISTINCT CDE, CID
2 from CCMMT
3 where ADT <
sysdate - 180
4 minus
5 select
DISTINCT CDE, CID
6 from CCMMT
7 where ADT >=
sysdate - 180;
707924 rows selected.
Elapsed: 00:00:26.94
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=36018 Card=2794940
Bytes=111797600)
1 0 MINUS
2 1 SORT (UNIQUE)
(Cost=18009 Card=2794940 Bytes=55898800)
3 2 INDEX
(RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)
4 1 SORT (UNIQUE)
(Cost=18009 Card=2794940 Bytes=55898800)
5 4 INDEX
(RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)
Statistics
----------------------------------------------------------
0
recursive calls
118 db block gets
22565 consistent
gets
31604 physical
reads
0 redo
size
18461368 bytes
sent via SQL*Net to client
519785 bytes received via SQL*Net from
client
47196 SQL*Net
roundtrips to/from client
1 sorts
(memory)
1 sorts
(disk)
707924 rows processed
效果也比较理想,consistent gets和physical reads再次大大下降,返回时间和上面差不多,在一个数量级上,但是不再使用并行查询了。
用NOT Exists代替minus:
4、
SQL> select DISTINCT CDE, CID
2 from CCMMT a
3 where ADT <
sysdate - 180
4 AND NOT EXISTS
5 (SELECT CDE,
CID FROM
6 (select
DISTINCT CDE, CID
7 from CCMMT
8 where ADT >=
sysdate - 180) b
9 WHERE a.CDE = b.CDE
10 AND a.CID = b.CID);
707924 rows selected.
Elapsed: 00:10:35.70
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=600 Card=144923 Bytes=2898460)
1 0 SORT (UNIQUE) (Cost=600 Card=144923
Bytes=2898460)
2 1 INDEX (RANGE SCAN) OF
'CCMMT_IDX3' (NON-UNIQUE)(Cost=2 Card=144923
Bytes=2898460)
3 2 TABLE
ACCESS (BY INDEX ROWID) OF 'CCMMT' (Cost=2 Card=1 Bytes=20)
4 3
INDEX (RANGE SCAN) OF 'CCMMT_TEMP_IDX' (NON-UNIQUE) (Cost=1 Card=9)
Statistics
----------------------------------------------------------
5
recursive calls
118 db block gets
40535587 consistent gets
3157604 physical reads
0 redo
size
18461368 bytes
sent via SQL*Net to client
519785 bytes received via SQL*Net from
client
47196 SQL*Net
roundtrips to/from client
2 sorts
(memory)
1 sorts
(disk)
707924 rows processed
FT! consistent gets和physical reads爆涨,10min才返回结果!
用Not In换掉Not Exists:
5、
SQL> select DISTINCT CDE, CID
2 from CCMMT a
3 where ADT <
sysdate - 180
4 AND (CDE, CID)
NOT IN
5 (select
DISTINCT CDE, CID
6 from CCMMT
7 where ADT >=
sysdate - 180);
707924 rows selected.
Elapsed: 00:01:00.70
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=36425 Card=1 Bytes=40)
1 0 SORT (UNIQUE NOSORT) (Cost=36425
Card=1 Bytes=40)
2 1 MERGE JOIN (ANTI)
(Cost=36423 Card=1 Bytes=40)
3 2 SORT (JOIN)
(Cost=18212 Card=2898455 Bytes=57969100)
4 3
INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455
Bytes=57969100)
5 2 SORT
(UNIQUE) (Cost=18212 Card=2898455 Bytes=57969100)
6 5
INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455
Bytes=57969100)
Statistics
----------------------------------------------------------
0
recursive calls
419 db block gets
22565 consistent
gets
98692 physical
reads
0 redo
size
18461368 bytes
sent via SQL*Net to client
519785 bytes received via SQL*Net from
client
47196 SQL*Net
roundtrips to/from client
1 sorts
(memory)
1 sorts
(disk)
707924 rows processed
恩,consistent gets和建了索引时的minus方式一样,但是physical reads太大,返回时间太长---1min。同时用到了刚才建的索引。(呵呵,所以说,NOT EXISTS并不是什么情况下都比NOT IN更优啊)
在尝试用left join + is null代替not in:
6、
SQL> SELECT a.CDE, a.CID
2 FROM
3 (select
DISTINCT CDE, CID
4 from CCMMT
5 where ADT <
sysdate - 180) a,
6 (select
DISTINCT CDE, CID
7 from CCMMT
8 where ADT >=
sysdate - 180) b
9 WHERE a.CDE = b.CDE(+)
10 AND a.CID = b.CID(+)
11 AND b.CDE IS NULL;
707924 rows selected.
Elapsed: 00:00:25.46
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=54675 Card=2794940
Bytes=117387480)
1 0 FILTER
2 1 MERGE JOIN (OUTER)
3 2 VIEW
(Cost=18009 Card=2794940 Bytes=58693740)
4 3
SORT (UNIQUE) (Cost=18009 Card=2794940 Bytes=55898800)
5 4
INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455
Bytes=57969100)
6 2 SORT (JOIN)
(Cost=36667 Card=2794940 Bytes=58693740)
7 6
VIEW (Cost=18009 Card=2794940 Bytes=58693740)
8 7
SORT (UNIQUE) (Cost=18009 Card=2794940 Bytes=55898800)
9 8
INDEX (RANGE SCAN) OF
'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)
Statistics
----------------------------------------------------------
10
recursive calls
118 db block gets
22569 consistent
gets
31300 physical
reads
0 redo
size
18461368 bytes
sent via SQL*Net to client
519785 bytes received via SQL*Net from
client
47196 SQL*Net
roundtrips to/from client
6 sorts
(memory)
1 sorts
(disk)
707924 rows processed
效果不错,和有索引时使用minus在同一数量级上。
总结,以上几种方式中,效果最好的应该是第3种和第6种,buffer gets、磁盘IO和CPU消耗都比较少,返回时间大大减少,但是需要新建一个索引,消耗更多磁盘空间,并存在影响其它语句的正常查询计划的风险。而第2种方式应该是次好的。在返回时间上,和上面两种差不多,不需要新的索引,但是会消耗更多的内存、磁盘和CPU资源。
出于综合考虑,采用了第2种方式对生产库进行了优化。
(以上例子中的对象名进行了替换,其他都是原版)