[中文]
Author:
fuyuncat
Source:
www.HelloDBA.com
Date:
2009-08-01 15:03:33
3.4.15. Operator (LIKE)
With the experience on previous formula derivation, the process is smooth in the LIKE derivation. I directly deduce the formula with mixed data types and mixed NDV (In fact, conclusion in the article 查询计划中Cardinality的计算 told us the selectivity of LIKE is same as the one of >, both is 1/20. It can be proved by following test):
A:160
B:80
C:100
D:200
E:400
A:VARCHAR2
B:NUMBER
C:DATE
D:DATE
E:NUMBER
EFFQRYCOLNUM
0
FLTCOLNUM
FILTER
COST_CPU
CPU_A
TYPFAC
CACULATED
1
a like :v1
257121440
7121440
250000000
250
2
a like :v1 and b like :v3
287121440
7121440
280000000
280
3
a like :v1 and b like :v2 and c like :v3
307996440
7121440
300875000
300.875
4
a like :v1 and b like :v2 and c like :v3 d like :v4
328040190
7121440
320918750
320.91875
From the test, the final rule is that the selectivity (1/NDV) is a fixed number, 1/20; however, it will increase the COLTYPFAC by 50. For example, the third one in above data set, the calculation is,
130+3*20+100*1+200*(1/20)+350*(1/20)^2 = 300.875
Then take the effective query columns into account,
EFFQRYCOLNUM
1
FLTCOLNUM
FILTER
COST_CPU
CPU_A
TYPFAC
CACULATED
1
a like :v1
258121440
7121440
251000000
251
2
a like :v1 and b like :v3
287171440
7121440
280050000
280.05
3
a like :v1 and b like :v2 and c like :v3
307998940
7121440
300877500
300.8775
4
a like :v1 and b like :v2 and c like :v3 d like :v4
328040310
7121440
320918870
320.91887
EFFQRYCOLNUM
2
FLTCOLNUM
FILTER
COST_CPU
CPU_A
TYPFAC
CACULATED
1
a like :v1
287221440
7121440
280100000
280.1
2
a like :v1 and b like :v3
308001440
7121440
300880000
300.88
3
a like :v1 and b like :v2 and c like :v3
328040430
7121440
320918990
320.91899
4
a like :v1 and b like :v2 and c like :v3 d like :v4
259121440
7121440
252000000
252
Combine with the rule above, all of the results matched the results calculated by formula 18.
3.4.16. Other operators (<>, IN and NOT IN)
Base on the conclusion in 查询计划中Cardinality的计算, the selectivity of the other operators with bind variable is,
<>: 1/NDV
IN: (variable number)*(1/NDV)
NOT IN: (1-1/NDV)^(variable number)
Following the way I worked on < & =, it’s not hard to find the formulas.
3.4.16.1. Operator (<>)
Here is the test data for <>:
A:160
B:80
C:100
D:200
E:400
A:VARCHAR2
B:NUMBER
C:DATE
D:DATE
E:NUMBER
EFFQRYCOLNUM
0
FLTCOLNUM
FILTER
COST_CPU
CPU_A
CPU_B
TYPFAC
CACULATED
1
a <> :v1
207121440
7121440
200000000
200
200.00
EFFQRYCOLNUM
2
FLTCOLNUM
FILTER
COST_CPU
CPU_A
CPU_B
TYPFAC
CACULATED
1
a <> :v1
246871440
7121440
239750000
239.75
239.75
2
a <> :v1 and b <> :v3
415437060
7121440
408315620
408.31562
408.31562
3
a <> :v1 and b <> :v2 and c <> :v3
729442978
7121440
722321538
722.321538
722.3215375
4
a <> :v1 and b <> :v2 and c <> :v3 d <> :v4
1040703111
7121440
1033581671
1033.581671
1033.581671
As said, the selectivity of <> is (1-1/NDV). Base on this, the data above are matched the result calculated by formula 18. Take the third one as example,
130+3*20+50*1+150*(1-1/160)+300*(1-1/160)*(1-1/80)+2*20*MAX(1,ROUND((1-1/160)*(1-1/80)*(1-1/100)*1000000,0))/1000000 = 722.3215375
3.4.16.2. Operator (IN)
After involved IN, it not only affected selectivity, but also affected the COLTYPFAC.
A:160
B:80
C:100
D:200
E:400
A:VARCHAR2
B:NUMBER
C:DATE
D:DATE
E:NUMBER
EFFQRYCOLNUM
0
FLTCOLNUM
FILTER
COST_CPU
CPU_A
CPU_B
TYPFAC
CACULATED
1
a in (:a1)
207121440
7121440
200000000
200
200.00
1
a in (:a1, :a2)
256808940
7121440
249687500
249.6875
249.6875
1
a in (:a1, :a2, :a3)
306185893
7121440
299064453
299.064453
299.0644531
EFFQRYCOLNUM
2
FLTCOLNUM
FILTER
COST_CPU
CPU_A
CPU_B
TYPFAC
CACULATED
1
a in (:a1, :a2)
257308940
7121440
250187500
250.1875
250.1875
2
a in (:a1, :a2, :a3) and b in (:b1, :b2)
331794497
7121440
324673057
324.673057
324.6730569
3
a in (:a1, :a2) and b in (:b1) and c in (:c1, :c2, :c3, :c4)
298868886
7121440
291747446
291.747446
291.7474462
4
a in (:a1, :a2, :a3) and b in (:b1, :b2) and c in (:c1, :c2) and d in (:d1, :d2, :d3, :d4)
372066787
7121440
364945347
364.945347
364.9453465
With more efforts, the rule of the new COLTYPFAC finally be found.
COLTYPFAC_NEW = COLTYPEFAC_ORI*(1-1/NDV)^0+COLTYPEFAC_ORI*(1-1/NDV)^1+…+COLTYPEFAC_ORI*(1-1/NDV)^(INNUM-1)
The INNUM is the variable number of the IN.
Besides, seems the filter column process sequence is based on (1-1/NDV), not (1/NDV). (We will find the real rule later)
The calculation of the third one is,
130+3*20+(50*(1-1/160)^0+50*(1-1/160)^1)*1+(150*(1-1/100)^0)*(2*1/160)+(300*(1-1/100)^0+300*(1-1/100)^1+300*(1-1/100)^2+300*(1-1/100)^3)*(2*1/160)*(1*1/80)+2*20*MAX(1,ROUND((2*1/160)*(1*1/80)*(4*1/100)*1000000,0))/1000000 = 291.7474462
3.4.16.3. Operator (NOT IN)
Except the selectivity, all of the rules of NOT IN are same as the ones of IN.
A:160
B:80
C:100
D:200
E:400
A:VARCHAR2
B:NUMBER
C:DATE
D:DATE
E:NUMBER
EFFQRYCOLNUM
2
FLTCOLNUM
FILTER
COST_CPU
CPU_A
CPU_B
TYPFAC
CACULATED
1
a not in (:a1, :a2)
296310500
7121440
289189060
289.18906
289.18906
2
a not in (:a1, :a2, :a3) and b not in (:b1, :b2)
657035353
7121440
649913913
649.913913
649.9139131
3
a not in (:a1, :a2) and b not in (:b1) and c not in (:c1, :c2, :c3, :c4)
1635207532
7121440
1628086092
1628.086092
1628.086092
4
a not in (:a1, :a2, :a3) and b not in (:b1, :b2) and c not in (:c1, :c2) and d not in (:d1, :d2, :d3, :d4)
2383966666
7121440
2376845226
2376.845226
2376.845226
The selectivity of NOT IN is (1-1/NDV)^NOTINNUM. NOTINNUM is the variable number of NOT IN. Calculate the third one by formula 18:
130+4*20+(50+50*(1-1/160)+50*(1-1/160)*(1-1/160))*1+(150+150*(1-1/80))*(1-1/160)^3+(300+300*(1-1/100))*(1-1/160)^3*(1-1/80)^2+(300+300*(1-1/200)+300*(1-1/200)*(1-1/200)+300*(1-1/200)*(1-1/200)*(1-1/200))*(1-1/160)^3*(1-1/80)^2*(1-1/100)^2+2*20*MAX(1,ROUND((1-1/160)^3*(1-1/80)^2*(1-1/100)^2*(1-1/200)^4*1000000,0))/1000000 = 2376.845226
3.4.17. Table record number, again
You may carefully find the result calculated has been ROUNDED. For example, In the example of IN case, the formula calculated value of TYPFAC is 291.7474462, while the TYPFAC calculated by COST_CPU (the real TYPFAC) is 291.747446. By adjusting the TABROWS, it’s easy to find the rule of ROUNDING:
Formula 19: TYPFAC_COMP = ROUND (TYPFAC * TABROWS, 0) / TABROWS
Tip: TABROWS in the denominator here is ultimately can be counteracted with the TABROWS in CPU_B, but to keep along with the process before, I will not to remove it.
3.4.18. the uniform formula for all operators
Consider all of the formulas of the operators; we can draw a uniform formula for them,
Formula 20: TYPFAC = ROUND((130 + MAXFLTCOLPOS*20+ COLTYPEFAC1*1+ COLTYPEFAC2*COLSEL1 + … + COLTYPEFACn*COLSEL1*…*COLSELn-1 + 20*EFFQRYCOLNUM*MAX(1,ROUND(COLSEL1*…*COLSELn*TABROWS,0))/TABROWS)*TABROWS,0)/TABROWS
Among it,
数据类型
COLTYPEFAC
CHAR
50
VARCHAR
50
NUMBER
150
DATE
300
And here is the selectivity (COLSEL) calculation,
匹配符
COLSEL
>、<、<=、>=
1/20
LIKE
1/20
=
1/NDV
<>
1-1/NDV
IN
NOTINNUM*1/NDV
NOT IN
(1-1/NDV)^NOTINNUM
When operator is LIKE, the COLTYPFAC should be added 50;
When operator is IN or NOT IN, the COLTYPFAC should be calculated as, COLTYPEFAC*(1-1/NDV)^0+COLTYPEFAC*(1-1/NDV)^1+…+COLTYPEFAC*(1-1/NDV)^(INNUM-1)
3.4.19. Mixed operators
It’s time to study the mixed operators after the uniform formula be found. We assume the formula 20 still work in this case. Then what we need to do is to find out the real filter column process sequence.
With a large number of test data, I found the sequence is actually not such simple:
a > :v1 and b like :v2 and c IN (:c1, :c2) and d NOT IN (:d1, :d2) and e = :v5
TABROWS
10000
EFFQRYCOLNUM
0
ALLCOLDATATYPE
DATE
E_NDV:
400
ALLCOLNDV
200
12429415
7121440
5307975
530.7975
530.7975
TABROWS
1000000
537918968
7121440
530797528
530.797528
530.7975
a > :v1 and e = :v5
537871440
7121440
530750000
530.75
530.75
a > :v1 and b like :v2 and e = :v5
537915190
7121440
530793750
530.79375
530.79375
a > :v1 and b like :v2 and c IN (:c1, :c2) and d NOT IN (:d1, :d2) and e = :v5
TABROWS
1000000
EFFQRYCOLNUM
0
ALLCOLNDV
200
ALLCOLDATATYPE
DATE
287396640
7121440
280275200
280.2752
280.2751869
a > :v1 and b like :v2 and c IN (:c1, :c2)
252158940
7121440
245037500
245.0375
245.0375
From these data, the final sequence rule can be concluded as, when compare 2 filter columns, the optimizer will calculate 2 piece of data, one calculated by the COLTYPFAC of A product the COLSEL of B (sequence is A-B), and the other calculated by the COLTYPFAC of B product the COLSEL of A (sequence is B-A), then compare the result, choose the small one.