HelloDBA [中文]
Search Internet Search HelloDBA
  Oracle Technology Site. email: fuyuncat@gmail.com  MSN: fuyuncat@hotmail.com Add to circle  acoug  acoug 

Rescue data in the worst situation (2)

[中文]

Author:  fuyuncat

Source:  www.HelloDBA.com

Date:  2013-03-20 05:02:15

Share to  Twitter Facebook GMail Blogger Orkut Google Bookmarks Sina Weibo QQ Renren Tieba Kaixin Douban Taobao

In this case, the situation is more complicated. It's an Oracle 10.2.0.1 database running in Linux. The database was shutdown abnormally; There are only 3 data files can be used, 2 of them are system files, whose file IDs are not in sequence; the other one is the data file contain the data to be rescued. The control files, redo logs, undo files, temp files and other data files are lost. During this process, I encoutered more problems, but I work it out finally. 

Of course, we also need the RDBMS version, DB name, file position/size and other information, and then set SID and password. The content of the parameter is shown as below,

SQL代码
  1. _allow_resetlogs_corruption = true  
  2. control_files              = (/opt/oracle/oradata/ORA10R2/control/ora_control1,  
  3.                               /opt/oracle/oradata/ORA10R2/control/ora_control2,  
  4.                               /opt/oracle/oradata/ORA10R2/control/ora_control3)  
  5. db_name                    = ORA10R2  
  6. db_domain                  = ""  
  7. db_block_size              = 8192  
  8. undo_management            = manual  
  9. undo_tablespace            = ''  
  10. UNDO_RETENTION             = 900  
  11. nls_language               = "AMERICAN"  
  12. nls_territory              = "AMERICA"  
  13. user_dump_dest             = /opt/oracle/admin/ORA10R2/udump  
  14. background_dump_dest       = /opt/oracle/admin/ORA10R2/bdump  
  15. core_dump_dest             = /opt/oracle/admin/ORA10R2/cdump  
  16. sga_max_size               = 200M  
  17. sga_target                 = 200M  

All of these prepare, I start to create a new database:

SQL代码
  1. SQL> CREATE DATABASE ORA10R2  
  2.    USER SYS IDENTIFIED BY oracle  
  3.    USER SYSTEM IDENTIFIED BY oracle  
  4.    LOGFILE GROUP 1 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo01.log'SIZE 20M,  
  5.            GROUP 2 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo02.log'SIZE 20M,  
  6.            GROUP 3 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo03.log'SIZE 20M  
  7.    MAXLOGFILES 5  
  8.    MAXLOGMEMBERS 5  
  9.    MAXLOGHISTORY 1  
  10.    MAXDATAFILES 100  
  11.    MAXINSTANCES 1  
  12.    CHARACTER SET US7ASCII  
  13.    NATIONAL CHARACTER SET AL16UTF16  
  14.    DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m' SIZE 180M REUSE  
  15.    --, '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' SIZE 200M REUSE  
  16.    EXTENT MANAGEMENT LOCAL  
  17.    SYSAUX DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/sysaux01' SIZE 100M  
  18.    DEFAULT TEMPORARY TABLESPACE tempts1  
  19.       TEMPFILE '/opt/oracle/oradata/ORA10R2/datafile/temp01' SIZE 20M  
  20.    --UNDO TABLESPACE undotbs1  
  21.    --   DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/undotbs01'  
  22.    --   SIZE 500M AUTOEXTEND ON NEXT 5120K MAXSIZE UNLIMITED  
  23.    ;  
  24.   
  25.   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17  18   19   20   21   22   23  
  26. Database created.  

Because the file IDs of the system data files are not in sequence, we need make sure the new file id align with the old one. And we can use BBED to get the file id of the 2nd system file.

SQL代码
  1. BBED> p kcvfh  
  2. ...  
  3.       ub2 kccfhfno                          @52       0x0008  
  4. ...  

To assign file id as 8, we need create some transient data files to advance the file id, and then add the 2nd system file,

SQL代码
  1. SQL> create tablespace test datafile '/opt/oracle/oradata/ORA10R2/datafile/test3' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test4' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test5' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test6' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test7' size 10M;  
  2. alter tablespace system add datafile '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' size 200M;  
  3. DROP TABLESPACE test INCLUDING CONTENTS AND DATAFILES;  
  4.   
  5. Tablespace created.  
  6.   
  7. SQL>  
  8. Tablespace altered.  
  9.   
  10. SQL>  
  11. Tablespace dropped.  

If I continue the process as same as the previous one in 11g, I will encounter a huge number errors, such as ORA-00600 [4000], ORA-01555, ORA-01111, ORA-01173, ORA-00600 [4049], etc. With further analysing, I found the root cause of these erros is "consistent reads".

During the database open phase, oracle need read the system dictionaries to complete bootstrap. However, the Checkpoint SCN of the new database is quite small, when the process reading the data blocks of the dictionaries, it found the SCN in the ITL is larger than the transaction SCN, and then it read the rollback segments to keep the data consistency. While the undo blocks of rollback segments may not exist or over written, consequently, it raised such errors.

But the truth also brings a question: why did I not encounter the similar problem in the previous test. After analyse the data files, I found it is because the NID process modified the SCN information in file header of the system file. However, I have replaced the file header with the new one. So, I guess the NID process might read the data from dictionary and wrote them to the file header. And we can also get tips from the ORA-00600 error raised in previous case, which indicated the NID process call an internal package to change the DBID.

However, the nid process of 10g did not modify the SCN. Hence, I have to advance the SCN manually. To achive this objective, I used the hidden parameter _minimum_giga_scn. One thing we need note is that it will fail if the value is too small. Correspondingly, we can find below entries from alert log,

SQL代码
  1. Current SCN is not changed: _minimum_giga_scn (scn 274877906944) is too small  

I set _minimum_giga_scn = 1024, and rebound the database, the alert log indicate that the SCN is advanced successfully.

SQL代码
  1. Advancing SCN to 1099511627776 according to _minimum_giga_scn  

Then, rename the new system files and copy the old data files to current data file folder,

SQL代码
  1. [oracle@server1 ORA10R2]$ mv datafile/ORA10R2_system_200m datafile/ORA10R2_system_200m_new  
  2. [oracle@server1 ORA10R2]$ mv datafile/ORA10R2_system_02 datafile/ORA10R2_system_02_new  
  3. [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_system_200m ./datafile/  
  4. [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_system_02 ./datafile/  
  5. [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_example_50m ./datafile/  

And execute NID to modify the DBID (But I don't think this step is necessary now)

SQL代码
  1. [oracle@server1 ORA10R2]$ nid target=sys/oracle  
  2.   
  3. DBNEWID: Release 10.2.0.1.0 - Production on Tue Mar 19 00:44:33 2013  
  4. ... ...  
  5. Database ID for database ORA10R2 changed to 4153675250.  
  6. All previous backups and archived redo logs for this database are unusable.  
  7. Database has been shutdown, open database with RESETLOGS option.  
  8. Succesfully changed database ID.  
  9. DBNEWID - Completed succesfully.  

Re-create the control files, open database RESETLOGS:

SQL代码
  1. SQL> CREATE CONTROLFILE REUSE DATABASE ORA10R2 RESETLOGS NOARCHIVELOG  
  2.     MAXLOGFILES 5  
  3.     MAXLOGMEMBERS 5  
  4.     MAXDATAFILES 100  
  5.     MAXINSTANCES 1  
  6.     MAXLOGHISTORY 292  
  7. LOGFILE  
  8.   GROUP 1 '/opt/oracle/oradata/ORA10R2/onlinelog/redo01.log' SIZE 20M,  
  9.   GROUP 2 '/opt/oracle/oradata/ORA10R2/onlinelog/redo02.log' SIZE 20M,  
  10.   GROUP 3 '/opt/oracle/oradata/ORA10R2/onlinelog/redo03.log' SIZE 20M  
  11. DATAFILE  
  12.   '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m' SIZE 180M  
  13.   , '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' SIZE 200M  
  14.   --, '/opt/oracle/oradata/ORA10R2/datafile/sysaux01' size 100M  
  15.   --, '/opt/oracle/oradata/ORA10R2/datafile/undotbs01'  
  16. CHARACTER SET US7ASCII  
  17. ;  
  18.   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17  
  19. Control file created.  
  20.   
  21. SQL> recover database using backup controlfile until cancel;  
  22. ORA-00279: change 1099511627897 generated at 03/19/2013 00:43:31 needed for  
  23. thread 1  
  24. ORA-00289: suggestion : /opt/oracle/product/10.2.0/db/dbs/arch1_2_810434521.dbf  
  25. ORA-00280: change 1099511627897 for thread 1 is in sequence #2  
  26.   
  27.   
  28. Specify log: {<RET>=suggested | filename | AUTO | CANCEL}  
  29. cancel  
  30. Media recovery cancelled.  
  31. SQL> alter database open resetlogs;  
  32. alter database open resetlogs  
  33. *  
  34. ERROR at line 1:  
  35. ORA-01092: ORACLE instance terminated. Disconnection forced  

Opening database failed. The errors from alert log are as below,

SQL代码
  1. ORA-01177: data file does not match dictionary - probably old incarnation  
  2. ORA-01110: data file 1: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m'  

I did it again with 10046 event enabled, and I found the SQL caused the error and the waits.

SQL代码
  1. PARSING IN CURSOR #2 len=122 dep=1 uid=0 oct=3 lid=0 tim=1331710783410619 hv=1330125001 ad='2c3341c8'  
  2. select blocks,NVL(ts#,-1),status$,NVL(relfile#,0),maxextend,inc, crscnwrp,crscnbas,NVL(spare1,0) from file$ where file#=:1  
  3. END OF STMT  
  4. PARSE #2:c=0,e=300,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1331710783410615  
  5. BINDS #2:  
  6. kkscoacd  
  7.  Bind#0  
  8.   oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00  
  9.   oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0  
  10.   kxsbbbfp=b71e3928  bln=22  avl=02  flg=05  
  11.   value=1  
  12. EXEC #2:c=1000,e=574,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1331710783411304  
  13. WAIT #2: nam='db file sequential read' ela= 14 file#=1 block#=258 blocks=1 obj#=-1 tim=1331710783411361  
  14. WAIT #2: nam='db file sequential read' ela= 10 file#=1 block#=114 blocks=1 obj#=-1 tim=1331710783411419  
  15. ...  
  16. ORA-01177: data file does not match dictionary - probably old incarnation  
  17. ORA-01110: data file 1: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m'  

Obviousely, the process read data from file$, and compare them with the information in file header, and found something unmatched, so, it raised the errors.

Before to solve the problem, I mounted the database and recovered the data files,

SQL代码
  1. SQL> startup mount  
  2. SQL> recover datafile 1  
  3. Media recovery complete.  
  4. SQL> recover datafile 8  
  5. Media recovery complete.  

Using BBED to open #114 block who belogns to file$, get the 1st record, which is the meta data of the 1st data file.

SQL代码
  1. BBED> set file 1 block 114  
  2.         FILE#           1  
  3.         BLOCK#          114  
  4. BBED> p *kdbr[0]  
  5. BBED> x /rnnnnnnnnnnnnnnnnnnnnnnnnnnn  
  6. rowdata[678]                                @8153  
  7. ------------  
  8. flag@8153: 0x2c (KDRHFL, KDRHFF, KDRHFH)  
  9. lock@8154: 0x00  
  10. cols@8155:   11  
  11.   
  12. col    0[2] @8156: 1 -- FILE#  
  13. col    1[2] @8159: 2 -- STATUS$  
  14. col    2[4] @8162: 23040 -- BLOCKS  
  15. col    3[1] @8167: 0 -- TS#  
  16. col    4[2] @8169: 1 -- RELFILE#  
  17. col    5[1] @8172: 0 -- MAXEXTEND  
  18. col    6[1] @8174: 0 -- INC  
  19. col    7[1] @8176: 0 -- CRSCNWRP  
  20. col    8[2] @8178: 8 -- CRSCNBAS  
  21. col    9[0] @8181: *NULL-- OWNERINSTANCE  
  22. col   10[5] @8182: 4194306 -- SPARE1  

Open the file header, I found the creation SCN not matched.

SQL代码
  1. BBED> set file 1 block 1  
  2. BBED> map  
  3. BBED> p kcvfh  
  4. struct kcvfh, 676 bytes                     @0  
  5.    struct kcvfhbfh, 20 bytes                @0  
  6.       ub1 type_kcbh                         @0        0x0b  
  7.       ub1 frmt_kcbh                         @1        0xa2  
  8.       ub1 spare1_kcbh                       @2        0x00  
  9. ...  
  10.    struct kcvfhcrs, 8 bytes                 @100  
  11.       ub4 kscnbas                           @100      0x00000006  -- CRSCNBAS  
  12.       ub2 kscnwrp                           @104      0x0000      -- CRSCNWRP  
  13. ...  
  14.    struct kcvfhckp, 36 bytes                @484  
  15.       struct kcvcpscn, 8 bytes              @484  
  16.          ub4 kscnbas                        @484      0x00006fa4  -- CKPCNBAS  
  17.          ub2 kscnwrp                        @488      0x0100      -- CKPCNWRP  
  18. ...  

To avoid this problem, we need make sure the data in both sides matched. Therefore, I modified the data in the dictionary.

Using DUMP see what data I need input

SQL代码
  1. SQL> select dump(6,16) from dual;  
  2.   
  3. DUMP(8,16)  
  4. -----------------  
  5. Typ=2 Len=2: c1,7  

And using BBED modify the record

SQL代码
  1. BBED> set file 1 block 114  
  2. BBED> dump offset 8178 count 4  
  3. BBED> modify /x 02c107ff  
  4. BBED> sum apply  

And then modify the record of the 8th file

File header of the 8th data file

SQL代码
  1. struct kcvfhcrs, 8 bytes                 @100  
  2.   ub4 kscnbas                           @100      0x00001c64  
  3.   ub2 kscnwrp                           @104      0x0000  

Dictionary data:

SQL代码
  1. col    0[2] @7478: 8  
  2. col    1[2] @7481: 2  
  3. col    2[3] @7484: 25600  
  4. col    3[1] @7488: 0  
  5. col    4[2] @7490: 8  
  6. col    5[1] @7493: 0  
  7. col    6[1] @7495: 0  
  8. col    7[1] @7497: 0  
  9. col    8[4] @7499: 140113  
  10. col    9[0] @7504: *NULL*  
  11. col   10[5] @7505: 33554434  

Result of dump:

XML/HTML代码
  1. SQL> select dump(7268,16) from dual;  
  2.   
  3. DUMP(7268,16)  
  4. ---------------------  
  5. Typ=2 Len=3: c2,49,45  
  6.   
  7. SQL> select dump(140113,16) from dual;  
  8.   
  9. DUMP(140113,16)  
  10. ---------------------  
  11. Typ=2 Len=4: c3,f,2,e  

Modify the data using BBED,

SQL代码
  1. BBED> dump offset 7499 count 6  
  2. BBED> modify /x 04c24945  
  3. BBED> sum apply  

After all of these done, the database can be opened successfully, data form old dictionaries can be read now,

SQL代码
  1. SQL> alter database open;  
  2.   
  3. Database altered.  
  4.   
  5. SQL> select file_id, tablespace_name from dba_data_files where tablespace_name like '%EXAMPLE%';  
  6.   
  7.    FILE_ID TABLESPACE_NAME  
  8. ---------- ------------------------------  
  9.          4 EXAMPLE  
  10.   
  11. SQL> select file#, name from v$datafile where file#=4;  
  12.   
  13.      FILE# NAME  
  14. ---------- --------------------------------------------------------------------------------  
  15.          4 /opt/oracle/product/10.2.0/db/dbs/MISSING00004  

Rename the datafile, recover the database and alter the data file online

SQL代码
  1. SQL> ALTER DATABASE RENAME FILE '/opt/oracle/product/10.2.0/db/dbs/MISSING00004' TO '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m';  
  2.   
  3. Database altered.  
  4.   
  5. SQL> shutdown  
  6. ... ...  
  7. SQL> recover database until cancel;  
  8. Media recovery complete.  
  9. SQL> ALTER DATABASE DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m' ONLINE;  
  10.   
  11. Database altered.  
  12.   
  13. SQL> alter database open resetlogs;  
  14. alter database open resetlogs  
  15. *  
  16. ERROR at line 1:  
  17. ORA-01122: database file 4 failed verification check  
  18. ORA-01110: data file 4: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m'  
  19. ORA-01206: file is not part of this database - wrong database id  

Here I encountered another error, which indicate the DB ID of the data file does not match to the database. I get the DBID from file header of other data file.

SQL代码
  1. BBED> set file 1 block 1  
  2.         FILE#           1  
  3.         BLOCK#          1  
  4.   
  5. BBED> p kcvfh  
  6. ...   
  7.       ub4 kccfhdbi                          @28       0xf7940df2  
  8. ...  
  9. BBED> dump offset 28 count 4  
  10.  File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m (1)  
  11.  Block: 1                Offsets:   28 to   31           Dba:0x00400001  
  12. ------------------------------------------------------------------------  
  13.  f20d94f7  

And use BBED to modify the data file to be online

SQL代码
  1. BBED> set file 4 block 1  
  2.         FILE#           4  
  3.         BLOCK#          1  
  4.   
  5. BBED> p kcvfh  
  6. ...  
  7.       ub4 kccfhdbi                          @28       0xb9888c45  
  8. ...  
  9. BBED> dump offset 30 count 2  
  10.  File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)  
  11.  Block: 1                Offsets:   30 to   31           Dba:0x01000001  
  12. ------------------------------------------------------------------------  
  13.  88b9  
  14. BBED> modify /x 94f7  
  15.  File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)  
  16.  Block: 1                Offsets:   30 to   31           Dba:0x01000001  
  17. ------------------------------------------------------------------------  
  18.  94f7  
  19. BBED> dump offset 28 count 4  
  20.  File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)  
  21.  Block: 1                Offsets:   28 to   31           Dba:0x01000001  
  22. ------------------------------------------------------------------------  
  23.  f29d94f7  
  24. BBED> modify /x f20d  
  25.  File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)  
  26.  Block: 1                Offsets:   28 to   31           Dba:0x01000001  
  27. ------------------------------------------------------------------------  
  28.  f20d94f7  
  29.   
  30. BBED> sum apply  
  31. Check value for File 4, Block 1:  
  32. current = 0x4460, required = 0x4460  

Then the database was opened successfully.

SQL代码
  1. SQL> alter database open resetlogs;  
  2.   
  3. Database altered.  

Create the temporary tablespace and modify the users,

SQL代码
  1. SQL> CREATE TEMPORARY TABLESPACE temp2 TEMPFILE '/opt/oracle/oradata/ORA10R2/datafile/temp02' size 100M;  
  2.   
  3. Tablespace created.  
  4.   
  5. SQL> select 'ALTER USER '||username||' TEMPORARY TABLESPACE TEMP2;' from dba_users;  
  6.   
  7. 'ALTERUSER'||USERNAME||'TEMPORARYTABLESPACETEMP2;'  
  8. ---------------------------------------------------------------------  
  9. ALTER USER STMADMIN TEMPORARY TABLESPACE TEMP2;  
  10. ... ...  
  11.   
  12. SQL> ALTER USER STMADMIN TEMPORARY TABLESPACE TEMP2;  
  13. ... ...  
  14. User altered.  

Export the data using EXP.

SQL代码
  1. [oracle@server1 ORA10R2]$ exp system/oracle tablespaces=EXAMPLE file=resecue.dat log=resecue.log  
  2. ... ...  
  3. . . exporting table                         USR_MV         11 rows exported  
  4. . exporting referential integrity constraints  
  5. . exporting triggers  
  6. Export terminated successfully without warnings.  

Now, the data is rescued succesfully.

Bear in mind, both data rescuing processes were completed in an testing environment, while the real case may be more complicated. Anyway, this method could be the last choice if you are trapped in such situation. To guarantee the saftety of the data, backup plan and disaster recover solution are always the first choice.

--- Wei Huang ---

Top

Copyright ©2005, HelloDBA.Com All reseverd.

Declaration
by fuyuncat