恶意删除bootstrap$导致数据库无法正常启动

有客户10.2.0.5的数据库关闭之后,无法正常启动报ORA-00704 ORA-00702错误.使用自研的ORA-702_Recovery Tool小工具可以实现一键恢复,下载地址:ORA-702_Recovery下载

Fri May 18 22:42:26  2018
ALTER DATABASE OPEN
Fri May 18 22:42:27  2018
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Fri May 18 22:42:27  2018
Started redo scan
Fri May 18 22:42:27  2018
Completed redo scan
 1 redo blocks read, 0 data blocks need recovery
Fri May 18 22:42:27  2018
Started redo application at
 Thread 1: logseq 2, block 2, scn 8448162573
Fri May 18 22:42:27  2018
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: D:\DATABASE\xifenfei\REDO02.LOG
Fri May 18 22:42:27  2018
Completed redo application
Fri May 18 22:42:27  2018
Completed crash recovery at
 Thread 1: logseq 2, block 3, scn 8448182575
 0 data blocks read, 0 data blocks written, 1 redo blocks read
Fri May 18 22:42:28  2018
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=23, OS id=3188
ARC1 started with pid=24, OS id=3168
ARC2 started with pid=25, OS id=996
ARC3 started with pid=26, OS id=432
ARC4 started with pid=27, OS id=3728
Fri May 18 22:42:28  2018
ARC0: Archival started
ARC1: Archival started
ARC5 started with pid=28, OS id=2876
Fri May 18 22:42:28  2018
ARC2: Archival started
ARC3: Archival started
ARC4: Archival started
ARC5: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Fri May 18 22:42:28  2018
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: D:\DATABASE\xifenfei\REDO03.LOG
Successful open of redo thread 1
Fri May 18 22:42:28  2018
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri May 18 22:42:28  2018
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri May 18 22:42:28  2018
ARC2: Becoming the heartbeat ARCH
Fri May 18 22:42:28  2018
SMON: enabling cache recovery
Fri May 18 22:42:28  2018
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_3148.trc:
ORA-00704: 引导程序进程失败
ORA-00702: 引导程序版本 '' 与版本 '8.0.0.0.0' 不一致

Fri May 18 22:42:28  2018
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 3148
ORA-1092 signalled during: ALTER DATABASE OPEN...

根据以前恢复经验ORA-00702: bootstrap verison ” inconsistent with version ’8.0.0.0.0′,很可能是由于bootstrap$表异常了.
通过dbv检查system文件确认没有坏块
dbv-system


通过bbed分析,确认记录被删除
把数据文件拷贝到本地,通过bbed进行分析,确认记录丢失

BBED> map
 File: d:/system01.dbf (0)
 Block: 379                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0

 struct ktbbh, 48 bytes                     @20

 struct kdbh, 14 bytes                      @68

 struct kdbt[1], 4 bytes                    @82

 sb2 kdbr[24]                               @86

 ub1 freespace[1158]                        @134

 ub1 rowdata[6896]                          @1292

 ub4 tailchk                                @8188


BBED> p *kdbr[0]
rowdata[6875]
-------------
ub1 rowdata[6875]                           @8167     0x3c

BBED> x /rnnc
rowdata[6875]                               @8167
-------------
flag@8167: 0x3c (KDRHFL, KDRHFF, KDRHFD, KDRHFH)
lock@8168: 0x01
cols@8169:    0

故障原因跟踪
有人在数据库中注入了恶意脚本,导致数据库删除了bootstrap$中数据,关闭之后无法正常启动
delete-bootstrap$


处理方法
通过oracle bbed 修复数据字典,正常启动数据库

发表在 非常规恢复 | 标签为 , , , , | 评论关闭

使用_unnest_subquery优化sql

一个复杂的sql查询,使用了大量EXISTS和NOT EXISTS 关联导致sql执行效率低下,这里挑选出来最核心的部分进行演示

SQL> explain plan for   select  
  2   a.aab034, a.aac001
  3    from si_dp.ac01_ac02 a
  4   where exists (select 1
  5            from ic40
  6           where aac001 = a.aac001
  7             and aae045 <= '201803'
  8             and aae120 = '0')
  9     and not exists (select 1
 10            from ic15
 11           where aac001 = a.aac001
 12             and aae002 <= '201803')
 13     and not EXISTS (select aab001
 14            from ab01
 15           where aab019 in ('91', '93')
 16             AND aab001 = a.aab001)
 17    and exists (select 1
 18            from ac13
 19           where aac001 = a.aac001
 20             and aae140 = '11'
 21             and aae114 in ('0', '1')
 22             and aae002 <= '201803')
 23     AND EXISTS (SELECT 1
 24            FROM AC13
 25           WHERE AAC001 = A.AAC001
 26             and aae140 = '11'
 27             AND AAE143 = '02'
 28             AND AAE003 < '201707'
 29             AND AAE002 BETWEEN '201801' AND '201803'
 30             and aae114 = '1')
 31     AND not EXISTS (SELECT 1
 32         FROM AC13
 33           WHERE AAC001 = A.AAC001
 34             and aae140 = '11'
 35          AND AAE002 < '201801')
 36     AND not EXISTS (SELECT 1
 37            FROM ac02
 38           WHERE AAC001 = A.AAC001
 39             and aae140 = '11'
 40             AND AAE036 < date '2018-1-1');
 
Explained.

Elapsed: 00:00:00.36

SQL> select * from table (dbms_xplan.display);


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                    |     1 |   202 |       | 11172   (2)|
|   1 |  NESTED LOOPS SEMI                |                    |     1 |   202 |       | 11172   (2)|
|   2 |   NESTED LOOPS ANTI               |                    |     1 |   175 |       | 11168   (2)|
|   3 |    NESTED LOOPS SEMI              |                    |     1 |   150 |       | 11164   (2)|
|   4 |     NESTED LOOPS ANTI             |                    |     1 |   126 |       | 11160   (2)|
|   5 |      NESTED LOOPS SEMI            |                    |     1 |   104 |       | 11158   (2)|
|   6 |       NESTED LOOPS ANTI           |                    |     1 |    67 |       | 11145   (2)|
|   7 |        HASH JOIN ANTI             |                    |     1 |    50 |  8640K| 11143   (2)|
|   8 |         TABLE ACCESS FULL         | AC01_AC02          |   245K|  5755K|       |   356   (2)|
|   9 |         TABLE ACCESS FULL         | AC02               |   559K|    13M|       |  9346   (2)|
|  10 |        TABLE ACCESS BY INDEX ROWID| AB01               |     2 |    34 |       |     2   (0)|
|  11 |         INDEX UNIQUE SCAN         | PK_AB01            |     1 |       |       |     1   (0)|
|  12 |       TABLE ACCESS BY INDEX ROWID | AC13               |   325K|    11M|       |    13   (0)|
|  13 |        INDEX RANGE SCAN           | I_AC13_AAE143      |   446 |       |       |     4   (0)|
|  14 |      INDEX RANGE SCAN             | PK_IC15            |  1771K|    37M|       |     2   (0)|
|  15 |     TABLE ACCESS BY INDEX ROWID   | IC40               |    17M|   395M|       |     4   (0)|
|  16 |      INDEX RANGE SCAN             | PK_IC40            |     1 |       |       |     3   (0)|
|  17 |    TABLE ACCESS BY INDEX ROWID    | AC13               |    51M|  1236M|       |     4   (0)|
|  18 |     INDEX RANGE SCAN              | RELATION_233112_FK |     3 |       |       |     3   (0)|
|  19 |   TABLE ACCESS BY INDEX ROWID     | AC13               |    52M|  1350M|       |     4   (0)|
|  20 |    INDEX RANGE SCAN               | RELATION_233112_FK |     3 |       |       |     3   (0)|
-----------------------------------------------------------------------------------------------------

这条sql,在一个10.2.0.3的系统中执行了十几个小时无法出结果,开发商反馈,该大部分客户的11.2的环境中,大概十几分钟出结果.从来没有遇到此类情况.让我们给他优化sql.看到这个sql,第一反应就是很可能大量的NESTED LOOPS效率低下,怀疑统计信息错误,结果收集完统计信息之后,执行计划依旧,我就在思考怎么调整sql,让其不这样大量嵌套执行.想起来的_unnest_subquery是控制子查询嵌套转换的,从9i开始默认为true,尝试设置为false测试.

SQL> alter session set "_unnest_subquery"=false;

Session altered.

Elapsed: 00:00:00.00
SQL> explain plan for   select 
  2   a.aab034, a.aac001
  3    from si_dp.ac01_ac02 a
  4   where exists (select 1
  5            from ic40
  6           where aac001 = a.aac001
  7             and aae045 <= '201803'
  8             and aae120 = '0')
  9     and not exists (select 1
 10            from ic15
 11           where aac001 = a.aac001
 12             and aae002 <= '201803')
 13     and not EXISTS (select aab001
 14            from ab01
 15           where aab019 in ('91', '93')
 16             AND aab001 = a.aab001)
 17    and exists (select 1
 18            from ac13
 19          where aac001 = a.aac001
 20             and aae140 = '11'
 21             and aae114 in ('0', '1')
 22             and aae002 <= '201803')
 23     AND EXISTS (SELECT 1
 24            FROM AC13
 25           WHERE AAC001 = A.AAC001
 26             and aae140 = '11'
 27             AND AAE143 = '02'
 28             AND AAE003 < '201707'
 29             AND AAE002 BETWEEN '201801' AND '201803'
 30             and aae114 = '1')
 31     AND not EXISTS (SELECT 1
 32            FROM AC13
 33           WHERE AAC001 = A.AAC001
 34             and aae140 = '11'
 35             AND AAE002 < '201801')
 36     AND not EXISTS (SELECT 1
 37            FROM ac02
 38           WHERE AAC001 = A.AAC001
 39             and aae140 = '11'
 40             AND AAE036 < date '2018-1-1');
Explained.

Elapsed: 00:00:00.07

SQL> select * from table (dbms_xplan.display);


PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |   185K|    19M|       |  2991K  (2)|
|   1 |  FILTER                       |                  |       |       |       |            |
|   2 |   HASH JOIN RIGHT SEMI        |                  |   185K|    19M|    16M|   758K  (3)|
|   3 |    TABLE ACCESS BY INDEX ROWID| AC13             |   353K|    12M|       |  4556   (1)|
|   4 |     INDEX SKIP SCAN           | I_AC13_AAB001    | 23608 |       |       |  2287   (1)|
|   5 |    HASH JOIN SEMI             |                  |   201K|    14M|    11M|   751K  (3)|
|   6 |     HASH JOIN SEMI            |                  |   201K|  9452K|  8640K|   123K  (3)|
|   7 |      TABLE ACCESS FULL        | AC01_AC02        |   245K|  5755K|       |   357   (2)|
|   8 |      TABLE ACCESS FULL        | IC40             |    21M|   481M|       | 86122   (3)|
|   9 |     TABLE ACCESS FULL         | AC13             |    52M|  1350M|       |   530K  (3)|
|  10 |   INDEX RANGE SCAN            | PK_IC15          |     2 |    44 |       |     3   (0)|
|  11 |   VIEW                        | index$_join$_009 |     1 |    17 |       |     3  (34)|
|  12 |    HASH JOIN                  |                  |       |       |       |            |
|  13 |     INDEX RANGE SCAN          | PK_AB01          |     1 |    17 |       |     2   (0)|
|  14 |     INLIST ITERATOR           |                  |       |       |       |            |
|  15 |      INDEX RANGE SCAN         | IDX_AB01_AAB019  |     1 |    17 |       |     8   (0)|
|  16 |   TABLE ACCESS BY INDEX ROWID | AC13             |     2 |    50 |       |     5   (0)|
|  17 |    INDEX RANGE SCAN           | I_AC13_SEARCH    |   152 |       |       |     4   (0)|
|  18 |   TABLE ACCESS BY INDEX ROWID | AC02             |     1 |    26 |       |     4   (0)|
|  19 |    INDEX RANGE SCAN           | PK_AC02          |     1 |       |       |     3   (0)|
-----------------------------------------------------------------------------------------------

让开发设置该参数,然后执行sql,结果3分钟不到出结果,非常圆满完成任务.该sql还有进一步优化空间,但是考虑到已经满足要求,不再折腾.

发表在 Oracle性能优化 | 标签为 | 评论关闭

ORA-19821故障分析

数据库报错
数据库启动报ORA-00283和ORA-19821错

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-19821: an intentionally corrupt log file was found


SQL> recover datafile 1;
ORA-00283: recovery session canceled due to errors
ORA-19821: an intentionally corrupt log file was found
Thu May 03 12:06:51 2018
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
  Mem# 0: D:\APP\SOHTDB\ORADATA\xifenfei\REDO01.LOG
Media Recovery failed with error 19821
Errors in file d:\app\sohtdb\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_pr00_660.trc:
ORA-00283: recovery session canceled due to errors
ORA-19821: an intentionally corrupt log file was found
Slave exiting with ORA-283 exception
Errors in file d:\app\sohtdb\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_pr00_660.trc:
ORA-00283: recovery session canceled due to errors
ORA-19821: an intentionally corrupt log file was found
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...

ORA-19821报错原因
这个错误相对比较少见,查询mos,由于设置了_disable_logging = TRUE导致该问题
ORA-19821


检查alert日志,_disable_logging参数确实被认为设置为true了.
_disable_logging

解决方法
根据官方的描述,这样的情况无法常规恢复,但是我们知道设置这个参数是为了不产生日志,因此出现这种情况,只能通过隐含参数,禁止数据库进行实例恢复,强制打开数据库.在这样的过程中非常容易遭遇类似ORA-600 2662的错误.
_disable_logging_mos

设置这个参数是为了不产生日志,当数据库非干净关闭(主机断电,数据库crash,shutdown abort等),就非常可能导致数据库无法正常启动.***千不可万不能在生产环境中设置_disable_logging = TRUE***
参考文档:Ora-19821 during the recovery (Doc ID 1217143.1)
Init.ora Parameter “_DISABLE_LOGGING” [Hidden] Reference Note (Doc ID 29552.1)

发表在 ORA-xxxxx, Oracle备份恢复 | 标签为 , | 评论关闭