HelloDBA [中文]
Search Internet Search HelloDBA
  Oracle Technology Site. email: fuyuncat@gmail.com  MSN: fuyuncat@hotmail.com Add to circle  acoug  acoug 

Full Table Scan cost formula cracking(5)

[中文]

Author:  fuyuncat

Source:  www.HelloDBA.com

Date:  2009-08-01 15:03:33

Share to  Twitter Facebook GMail Blogger Orkut Google Bookmarks Sina Weibo QQ Renren Tieba Kaixin Douban Taobao

3.4.15.           Operator (LIKE)

With the experience on previous formula derivation, the process is smooth in the LIKE derivation. I directly deduce the formula with mixed data types and mixed NDV (In fact, conclusion in the article 查询计划中Cardinality的计算 told us the selectivity of LIKE is same as the one of >, both is 1/20. It can be proved by following test):

A:160

B:80

C:100

D:200

E:400

 

A:VARCHAR2

B:NUMBER

C:DATE

D:DATE

E:NUMBER

 

EFFQRYCOLNUM

0

 

 

 

 

FLTCOLNUM

FILTER

COST_CPU

CPU_A

TYPFAC

CACULATED

1

a like :v1

257121440

7121440

250000000

250

2

a like :v1 and b like :v3

287121440

7121440

280000000

280

3

a like :v1 and b like :v2 and c like :v3

307996440

7121440

300875000

300.875

4

a like :v1 and b like :v2 and c like :v3 d like :v4

328040190

7121440

320918750

320.91875

From the test, the final rule is that the selectivity (1/NDV) is a fixed number, 1/20; however, it will increase the COLTYPFAC by 50. For example, the third one in above data set, the calculation is,

130+3*20+100*1+200*(1/20)+350*(1/20)^2 = 300.875

Then take the effective query columns into account,

EFFQRYCOLNUM

1

 

 

 

 

FLTCOLNUM

FILTER

COST_CPU

CPU_A

TYPFAC

CACULATED

1

a like :v1

258121440

7121440

251000000

251

2

a like :v1 and b like :v3

287171440

7121440

280050000

280.05

3

a like :v1 and b like :v2 and c like :v3

307998940

7121440

300877500

300.8775

4

a like :v1 and b like :v2 and c like :v3 d like :v4

328040310

7121440

320918870

320.91887

EFFQRYCOLNUM

2

 

 

 

 

FLTCOLNUM

FILTER

COST_CPU

CPU_A

TYPFAC

CACULATED

1

a like :v1

287221440

7121440

280100000

280.1

2

a like :v1 and b like :v3

308001440

7121440

300880000

300.88

3

a like :v1 and b like :v2 and c like :v3

328040430

7121440

320918990

320.91899

4

a like :v1 and b like :v2 and c like :v3 d like :v4

259121440

7121440

252000000

252

Combine with the rule above, all of the results matched the results calculated by formula 18.

3.4.16.           Other operators (<>, IN and NOT IN)

Base on the conclusion in 查询计划中Cardinality的计算, the selectivity of the other operators with bind variable is,

<>: 1/NDV

IN: (variable number)*(1/NDV)

NOT IN: (1-1/NDV)^(variable number)

Following the way I worked on < & =, it’s not hard to find the formulas.

3.4.16.1.             Operator (<>)

Here is the test data for <>:

A:160

B:80

C:100

D:200

E:400

 

 

A:VARCHAR2

B:NUMBER

C:DATE

D:DATE

E:NUMBER

 

 

EFFQRYCOLNUM

0

 

 

 

 

 

FLTCOLNUM

FILTER

COST_CPU

CPU_A

CPU_B

TYPFAC

CACULATED

1

a <> :v1

207121440

7121440

200000000

200

200.00

EFFQRYCOLNUM

2

 

 

 

 

 

FLTCOLNUM

FILTER

COST_CPU

CPU_A

CPU_B

TYPFAC

CACULATED

1

a <> :v1

246871440

7121440

239750000

239.75

239.75

2

a <> :v1 and b <> :v3

415437060

7121440

408315620

408.31562

408.31562

3

a <> :v1 and b <> :v2 and c <> :v3

729442978

7121440

722321538

722.321538

722.3215375

4

a <> :v1 and b <> :v2 and c <> :v3 d <> :v4

1040703111

7121440

1033581671

1033.581671

1033.581671

As said, the selectivity of <> is (1-1/NDV). Base on this, the data above are matched the result calculated by formula 18. Take the third one as example,

130+3*20+50*1+150*(1-1/160)+300*(1-1/160)*(1-1/80)+2*20*MAX(1,ROUND((1-1/160)*(1-1/80)*(1-1/100)*1000000,0))/1000000 = 722.3215375

3.4.16.2.             Operator (IN)

After involved IN, it not only affected selectivity, but also affected the COLTYPFAC.

A:160

B:80

C:100

D:200

E:400

 

 

A:VARCHAR2

B:NUMBER

C:DATE

D:DATE

E:NUMBER

 

 

EFFQRYCOLNUM

0

 

 

 

 

 

 

FLTCOLNUM

FILTER

COST_CPU

CPU_A

CPU_B

TYPFAC

CACULATED

 

1

a in (:a1)

207121440

7121440

200000000

200

200.00

 

1

a in (:a1, :a2)

256808940

7121440

249687500

249.6875

249.6875

 

1

a in (:a1, :a2, :a3)

306185893

7121440

299064453

299.064453

299.0644531

 

EFFQRYCOLNUM

2

 

 

 

 

 

 

FLTCOLNUM

FILTER

COST_CPU

CPU_A

CPU_B

TYPFAC

CACULATED

 

1

a in (:a1, :a2)

257308940

7121440

250187500

250.1875

250.1875

 

2

a in (:a1, :a2, :a3) and b in (:b1, :b2)

331794497

7121440

324673057

324.673057

324.6730569

 

3

a in (:a1, :a2) and b in (:b1) and c in (:c1, :c2, :c3, :c4)

298868886

7121440

291747446

291.747446

291.7474462

 

4

a in (:a1, :a2, :a3) and b in (:b1, :b2) and c in (:c1, :c2) and d in (:d1, :d2, :d3, :d4)

372066787

7121440

364945347

364.945347

364.9453465

 

               

With more efforts, the rule of the new COLTYPFAC finally be found.

COLTYPFAC_NEW = COLTYPEFAC_ORI*(1-1/NDV)^0+COLTYPEFAC_ORI*(1-1/NDV)^1+…+COLTYPEFAC_ORI*(1-1/NDV)^(INNUM-1)

The INNUM is the variable number of the IN.

Besides, seems the filter column process sequence is based on (1-1/NDV), not (1/NDV). (We will find the real rule later)

The calculation of the third one is,

130+3*20+(50*(1-1/160)^0+50*(1-1/160)^1)*1+(150*(1-1/100)^0)*(2*1/160)+(300*(1-1/100)^0+300*(1-1/100)^1+300*(1-1/100)^2+300*(1-1/100)^3)*(2*1/160)*(1*1/80)+2*20*MAX(1,ROUND((2*1/160)*(1*1/80)*(4*1/100)*1000000,0))/1000000 = 291.7474462

3.4.16.3.             Operator (NOT IN)

Except the selectivity, all of the rules of NOT IN are same as the ones of IN.

A:160

B:80

C:100

D:200

E:400

 

 

A:VARCHAR2

B:NUMBER

C:DATE

D:DATE

E:NUMBER

 

 

EFFQRYCOLNUM

2

 

 

 

 

 

FLTCOLNUM

FILTER

COST_CPU

CPU_A

CPU_B

TYPFAC

CACULATED

1

a not in (:a1, :a2)

296310500

7121440

289189060

289.18906

289.18906

2

a not in (:a1, :a2, :a3) and b not in (:b1, :b2)

657035353

7121440

649913913

649.913913

649.9139131

3

a not in (:a1, :a2) and b not in (:b1) and c not in (:c1, :c2, :c3, :c4)

1635207532

7121440

1628086092

1628.086092

1628.086092

4

a not in (:a1, :a2, :a3) and b not in (:b1, :b2) and c not in (:c1, :c2) and d not in (:d1, :d2, :d3, :d4)

2383966666

7121440

2376845226

2376.845226

2376.845226

The selectivity of NOT IN is (1-1/NDV)^NOTINNUM. NOTINNUM is the variable number of NOT IN. Calculate the third one by formula 18:

130+4*20+(50+50*(1-1/160)+50*(1-1/160)*(1-1/160))*1+(150+150*(1-1/80))*(1-1/160)^3+(300+300*(1-1/100))*(1-1/160)^3*(1-1/80)^2+(300+300*(1-1/200)+300*(1-1/200)*(1-1/200)+300*(1-1/200)*(1-1/200)*(1-1/200))*(1-1/160)^3*(1-1/80)^2*(1-1/100)^2+2*20*MAX(1,ROUND((1-1/160)^3*(1-1/80)^2*(1-1/100)^2*(1-1/200)^4*1000000,0))/1000000 = 2376.845226

3.4.17.           Table record number, again

You may carefully find the result calculated has been ROUNDED. For example, In the example of IN case, the formula calculated value of TYPFAC is 291.7474462, while the TYPFAC calculated by COST_CPU (the real TYPFAC) is 291.747446. By adjusting the TABROWS, it’s easy to find the rule of ROUNDING:

Formula 19: TYPFAC_COMP = ROUND (TYPFAC * TABROWS, 0) / TABROWS

Tip: TABROWS in the denominator here is ultimately can be counteracted with the TABROWS in CPU_B, but to keep along with the process before, I will not to remove it.

3.4.18.           the uniform formula for all operators

Consider all of the formulas of the operators; we can draw a uniform formula for them,

Formula 20: TYPFAC = ROUND((130 + MAXFLTCOLPOS*20+ COLTYPEFAC1*1+ COLTYPEFAC2*COLSEL1 + … + COLTYPEFACn*COLSEL1*…*COLSELn-1 + 20*EFFQRYCOLNUM*MAX(1,ROUND(COLSEL1*…*COLSELn*TABROWS,0))/TABROWS)*TABROWS,0)/TABROWS

Among it,

数据类型

COLTYPEFAC

CHAR

50

VARCHAR

50

NUMBER

150

DATE

300

And here is the selectivity (COLSEL) calculation,

匹配符

COLSEL

><<=>=

1/20

LIKE

1/20

=

1/NDV

<> 

1-1/NDV

IN

NOTINNUM*1/NDV

NOT IN

(1-1/NDV)^NOTINNUM

When operator is LIKE, the COLTYPFAC should be added 50;

When operator is IN or NOT IN, the COLTYPFAC should be calculated as, COLTYPEFAC*(1-1/NDV)^0+COLTYPEFAC*(1-1/NDV)^1+…+COLTYPEFAC*(1-1/NDV)^(INNUM-1)

3.4.19.           Mixed operators

It’s time to study the mixed operators after the uniform formula be found. We assume the formula 20 still work in this case. Then what we need to do is to find out the real filter column process sequence.

With a large number of test data, I found the sequence is actually not such simple:

a > :v1 and b like :v2 and c IN (:c1, :c2) and d NOT IN (:d1, :d2) and e = :v5

TABROWS

10000

EFFQRYCOLNUM

0

 

 

ALLCOLDATATYPE

DATE

E_NDV:

400

ALLCOLNDV

200

 

 

 

12429415

7121440

5307975

530.7975

530.7975

TABROWS

1000000

 

 

 

 

 

537918968

7121440

530797528

530.797528

530.7975

a > :v1 and e = :v5

 

 

 

 

 

 

537871440

7121440

530750000

530.75

530.75

a > :v1 and b like :v2 and e = :v5

 

 

 

 

 

 

537915190

7121440

530793750

530.79375

530.79375

a > :v1 and b like :v2 and c IN (:c1, :c2) and d NOT IN (:d1, :d2) and e = :v5

TABROWS

1000000

EFFQRYCOLNUM

0

ALLCOLNDV

200

ALLCOLDATATYPE

DATE

 

287396640

7121440

280275200

280.2752

280.2751869

a > :v1 and b like :v2 and c IN (:c1, :c2)

 

 

 

 

 

252158940

7121440

245037500

245.0375

245.0375

From these data, the final sequence rule can be concluded as, when compare 2 filter columns, the optimizer will calculate 2 piece of data, one calculated by the COLTYPFAC of A product the COLSEL of B (sequence is A-B), and the other calculated by the COLTYPFAC of B product the COLSEL of A (sequence is B-A), then compare the result, choose the small one.

Top

Copyright ©2005, HelloDBA.Com All reseverd.

Declaration
by fuyuncat