年归档:2022

rac kill 大事物后回滚慢,smon等待DFS lock handle和enq: TX – contention

Customer killed a large DELETE statement of a large table in instance 1 in high workload period in RAC environment of 2 instances.
Transaction recovery did not yet complete for more than 1 hour by querying V$FAST_START_TRANSACTIONS.
Update on the same table on instance 2 hung more than 1 hour.
Transaction recovery complete quickly after shutdown instance 2.
ASH shows high waiting on “enq: TX – row lock contention” and “transaction” without blocking session until instance 2 shutdown after DELETE session killed in instance 1

SQL> select instance_number,program,event,BLOCKING_INST_ID b_inst,count(session_id) session_cnt,
count(BLOCKING_SESSION) b_session_cnt,min(sample_time) start_time,max(sample_time) end_time,count(*) cnt
  2  from DBA_HIST_ACTIVE_SESS_HISTORY where sql_id='d1xg4v7rqx3kt' and instance_number=2
group by instance_number,program,event,BLOCKING_INST_ID
order by instance_number,start_time
  3    4    5  ;

INSTANCE_NUMBER PROGRAM            EVENT                         B_INST SESSION_CNT B_SESSION_CNT START_TIME END_TIME       CNT
--------------- ------------------ ----------------------------- ------ ----------- ------------- ---------- ---------- -------
              2 JDBC Thin Client   enq: TX - row lock contention             728939             0 19:20:34   21:04:25    728939
              2 JDBC Thin Client   transaction                               489837             0 20:12:44   21:04:25    489837

ASH shows that all sessions of the “enq: TX – row lock contention” without blocking session on instance 2 were waiting for the transaction of the Killed DELETE on instance 1

SQL> select instance_number,event,trunc(p2/65536) XIDUSN, trunc(mod(p2,65536)) XIDSLOT, 
p3 XIDSQN,count(session_id) session_cnt,count(BLOCKING_SESSION) b_session_cnt,min(sample_time) start_time,max(sample_time) end_time,count(*) cnt
from DBA_HIST_ACTIVE_SESS_HISTORY where sql_id='d1xg4v7rqx3kt' and instance_number=2 and 
to_char(sample_time,'HH24:MI') >= '20:12' AND BLOCKING_SESSION IS NULL AND EVENT='enq: TX - row lock contention'
group by instance_number,event,trunc(p2/65536) , trunc(mod(p2,65536)) , p3
order by start_time
;
  2    3    4    5

INSTANCE_NUMBER EVENT                            XIDUSN  XIDSLOT   XIDSQN SESSION_CNT B_SESSION_CNT START_TIME END_TIME      CNT
--------------- ------------------------------ -------- -------- -------- ----------- ------------- ---------- ---------- ------
              2 enq: TX - row lock contention       932       15   623055      717850             0 20:12:55   21:04:25   717850
SQL>

ASH shows that SMON of instance 1 was waiting for “DFS lock handle” and “enq: TX – contention” among DELETE session killed in instance 1 and instance 2 shutdown.

SQL> select instance_number,session_id,session_serial#,program,event,BLOCKING_SESSION b_sid,
BLOCKING_SESSION_SERIAL# b_serial,BLOCKING_INST_ID b_inst,min(sample_time) start_time,
max(sample_time) end_time,count(*) cnt
 from DBA_HIST_ACTIVE_SESS_HISTORY where upper(program) like '%SMON%'
group by instance_number,session_id,session_serial#,program,event,BLOCKING_SESSION ,BLOCKING_SESSION_SERIAL# ,BLOCKING_INST_ID
order by instance_number,start_time
;

INSTANCE_NUMBER SESSION_ID SESSION_SERIAL# PROGRAM                       EVENT                 B_SID  B_SERIAL B_INST START_TIME END_TIME   CNT
--------------- ---------- --------------- ----------------------------- --------------------- ----- --------- ------ ---------- ---------- ---
              1       3761               1 oracle@<instance_name> (SMON) DFS lock handle                              19:05:24   21:05:07   128
              1       3761               1 oracle@<instance_name> (SMON) enq: TX - contention                         20:14:31   21:04:47    50

The P1 of “DFS lock handle” shows that SMON was requesting TA enqueue through cross instance calls which is used for serializing operations on undo segments and undo tablespaces

SQL> col event for a20
select instance_number,sample_time,session_id,session_serial#,program,event,p1,p2,p3
  from DBA_HIST_ACTIVE_SESS_HISTORY where upper(program) like '%SMON%' AND EVENT='DFS lock handle' ;
  SQL>   2
INSTANCE_NUMBER SAMPLE_TIME     SESSION_ID SESSION_SERIAL# PROGRAM                       EVENT                      P1     P2     P3
--------------- --------------- ---------- --------------- ----------------------------- ---------------- ------------ ------ ------
              1 19:05:24              3761               1 oracle@<instance_name> (SMON) DFS lock handle    1413545989      3   4722
              1 20:12:27              3761               1 oracle@<instance_name> (SMON) DFS lock handle    1413545989      3   6668
              1 20:13:40              3761               1 oracle@<instance_name> (SMON) DFS lock handle    1413545989      3   5787
              1 20:13:50              3761               1 oracle@<instance_name> (SMON) DFS lock handle    1413545989      3   5580
              1 20:14:00              3761               1 oracle@<instance_name> (SMON) DFS lock handle    1413545989      3   5826
... ...
              1 21:04:57              3761               1 oracle@<instance_name> (SMON) DFS lock handle    1413545989      3   1347
              1 21:05:07              3761               1 oracle@<instance_name> (SMON) DFS lock handle    1413545989      3   4582

144 rows selected.

* All of P1 is 1413545989, converted to hexadecimal: ‭54410005‬, 54 ASCII code: T, 41 ASCII code: A => TA enqueue

The P2,P3 of “enq: TX – row lock contention” shows that SMON process was requesting TX enqueue of the UNDO block of the transaction of the Killed DELETE on instance 1.

SQL> select instance_number,session_id,session_serial#,program,event,trunc(p2/65536) XIDUSN, trunc(mod(p2,65536)) XIDSLOT, p3 XIDSQN,COUNT(1) CNT,min(sample_time) start_time,max(sample_time) end_time
from DBA_HIST_ACTIVE_SESS_HISTORY WHERE session_id =3761 and SESSION_SERIAL# = 1 and instance_number=1 AND event like 'enq: TX - contention%'
group by instance_number,session_id,session_serial#,program,event,trunc(p2/65536) , trunc(mod(p2,65536)) , p3
;
  2    3    4
INSTANCE_NUMBER SESSION_ID SESSION_SERIAL# PROGRAM                        EVENT                  XIDUSN  XIDSLOT   XIDSQN CNT START_TIME END_TIME
--------------- ---------- --------------- ------------------------------ -------------------- -------- -------- -------- --- ---------- ----------
              1       3761               1 oracle@<instance_name> (SMON)  enq: TX - contention      932       15   623055  50 20:14:31   21:04:47

SQL>
SQL> select instance_number,session_id,session_serial#,program,event,CURRENT_OBJ#, CURRENT_FILE#, CURRENT_BLOCK#,CURRENT_ROW#,COUNT(1) CNT,min(sample_time) start_time,max(sample_time) end_time
  2  from DBA_HIST_ACTIVE_SESS_HISTORY WHERE session_id =3761 and SESSION_SERIAL# = 1 and instance_number=1 AND event like 'enq: TX - contention%'
group by instance_number,session_id,session_serial#,program,event,CURRENT_OBJ#, CURRENT_FILE#, CURRENT_BLOCK#,CURRENT_ROW#
order by start_time
;
  3    4    5
INSTANCE_NUMBER SESSION_ID SESSION_SERIAL# PROGRAM                       EVENT                CURRENT_OBJ# CURRENT_FILE# CURRENT_BLOCK# CURRENT_ROW#   CNT START_TIME END_TIME
--------------- ---------- --------------- ----------------------------- -------------------- ------------ ------------- -------------- ------------ ----- ---------- ----------
              1       3761               1 oracle@<instance_name> (SMON) enq: TX - contention            1            42              4            0     1 20:14:31   20:14:31
              1       3761               1 oracle@<instance_name> (SMON) enq: TX - contention            1             3              3            0     1 20:16:22   20:16:22 <<<<CURRENT_FILE#:3 ==>UNDO
              1       3761               1 oracle@<instance_name> (SMON) enq: TX - contention            1             3              5            0     4 20:22:17   20:25:49 <<<<CURRENT_FILE#:3 ==>UNDO
              1       3761               1 oracle@<instance_name> (SMON) enq: TX - contention            0             3           1920            0   3 20:30:13   20:30:33 <<<<CURRENT_FILE#:3 ==>UNDO
              1       3761               1 oracle@<instance_name> (SMON) enq: TX - contention            0             3          15728            0 23 20:37:58   20:52:39 <<<<CURRENT_FILE#:3 ==>UNDO
              1       3761               1 oracle@<instance_name> (SMON) enq: TX - contention            1          1002              2            0    18 20:53:09   21:04:47

6 rows selected.

SQL>

解决方案
Please avoid to kill large transaction during high workload period.
If has to kill large transaction, shutdown all remote instances can speed up the transaction recovery.
参考:Transaction Recovery Slow And High Row Lock Contention After Killed Large Transaction in RAC (Doc ID 2668617.1)

发表在 Oracle | 标签为 , , | 评论关闭

最近两种加密oracle数据库文件勒索病毒可以实现直接open数据库恢复

最近接触到两种被加密的oracle数据库,可以实现数据库层面直接open库,通过oracle exp/expdp导出数据,业务直接使用,而不是通过类似dul方式恢复数据,然后业务整合花费大量时间的恢复
.DBF.[DAF1737E].[helprequest@techmail.info].mkp(被加密破坏32个block,对于11g+版本数据库,可以实现完美恢复,所有写入数据文件数据0丢失)
20221101123830


.DBF.id[B482CBD6-2815].[tsai_shen@zohomail.eu].eight(被加密192个block,对于11g+版本数据库,可以实现每个文件丢失数据1M左右的恢复)
20221101123850

以上两种勒索病毒加密的oracle数据库都可以通过Oracle数据文件加密勒索恢复工具实现快速恢复并且顺利open数据导出数据
20221101124127

如果有oracle数据库被加密,希望快速恢复业务(基本上和数据库exp/expdp导出数据效果一样),可以联系我们,提供最大限度数据恢复,数据库层面最完美的恢复效果

发表在 勒索恢复 | 标签为 , , , , , , | 评论关闭

open只有system文件的库

有一个朋友自己想测试只用system文件open库,闲着没事给他测试了下,顺利open成功(主要还是经验比较多,规避了很多坑)
1. 准备参数文件

*.audit_file_dest='C:\app\XFF\admin\ORCL\adump'
*.audit_trail='none'
*.compatible='11.2.0.3.0'
*.control_files='H:\TEMP\11203\control01.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='DBM'
*.diagnostic_dest='C:\app\XFF'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=ORCLXDB)'
*.nls_language='SIMPLIFIED CHINESE'
*.nls_territory='CHINA'
*.open_cursors=300
*.pga_aggregate_target=2147483648
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.sessions=170
*.sga_target=6442450944
*.undo_tablespace='UNDOTBS1'
undo_management=MANUAL
_corrupted_rollback_segments=
_allow_resetlogs_corruption=true

2. 准备重建ctl语句

CREATE CONTROLFILE REUSE DATABASE "DBM" RESETLOGS  NOARCHIVELOG  
    MAXLOGFILES 50  
    MAXLOGMEMBERS 5  
    MAXDATAFILES 100  
    MAXINSTANCES 8  
    MAXLOGHISTORY 226  
LOGFILE  
  GROUP 1 'H:\TEMP\11203\redo01.log'  SIZE 50M,  
  GROUP 2 'H:\TEMP\11203\redo02.log'  SIZE 50M,  
  GROUP 3 'H:\TEMP\11203\redo03.log'  SIZE 50M  
DATAFILE  
'H:\TEMP\11203\system01.dbf'
CHARACTER SET ZHS16GBK  
;  

3. 重建ctl并且resetogs open库

SQL> recover database using backup controlfile until cancel;
ORA-00279: 更改 40438873410 (在 10/21/2022 14:06:16 生成) 对于线程 1 是必需的
ORA-00289: 建议:
C:\APP\XFF\PRODUCT\11.2.0.3\DBHOME_1\RDBMS\ARC0000000093_1118545292.0001
ORA-00280: 更改 40438873410 (用于线程 1) 在序列 #93 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件 1 需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: 'H:\TEMP\11203\SYSTEM01.DBF'


ORA-01112: 未启动介质恢复


SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-01176: data dictionary has more than the 100 files allowed by the
controlfie
进程 ID: 3952
会话 ID: 14 序列号: 3

MAXDATAFILES值不对修改正确值,重建ctl,open库

SQL> RECOVER DATABASE;
完成介质恢复。
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
进程 ID: 6916
会话 ID: 14 序列号: 1

alert日志内容

Database Characterset is ZHS16GBK
Errors in file C:\APP\XFF\diag\rdbms\dbm\test\trace\test_smon_9384.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号  (名称为 "") 过小
Errors in file C:\APP\XFF\diag\rdbms\dbm\test\trace\test_ora_6916.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号  (名称为 "") 过小
Errors in file C:\APP\XFF\diag\rdbms\dbm\test\trace\test_ora_6916.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号  (名称为 "") 过小
Error 604 happened during db open, shutting down database
USER (ospid: 6916): terminating the instance due to error 604
Errors in file C:\APP\XFF\diag\rdbms\dbm\test\trace\test_smon_9384.trc  (incident=2521):
ORA-00600: 内部错误代码, 参数: [2662], [9], [1784188335], [9], [1784216952], [6019273], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\dbm\test\incident\incdir_2521\test_smon_9384_i2521.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Tue Nov 01 10:17:49 2022
Instance terminated by USER, pid = 6916
ORA-1092 signalled during: ALTER DATABASE OPEN...

修改文件头scn,并正常open库

SQL> startup nomount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 6413680640 bytes
Fixed Size                  2267184 bytes
Variable Size            1107298256 bytes
Database Buffers         5284823040 bytes
Redo Buffers               19292160 bytes
SQL> alter database mount;

数据库已更改。

SQL> set numw 16
SQL> col CHECKPOINT_TIME for a40
SQL> set lines 150
SQL> set pages 1000
SQL> SELECT status,
  2  to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#,
  3  count(*) ROW_NUM
  4  FROM v$datafile_header
  5  GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
  6  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS  CHECKPOINT_TIME                          FUZ CHECKPOINT_CHANGE#          ROW_NUM
------- ---------------------------------------- --- ------------------ ----------------
OFFLINE                                                               0              121
ONLINE  2022-11-01 10:17:44                      YES        40438893615                1

20221101102342


SQL> alter database open;

数据库已更改。

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------
H:\TEMP\11203\SYSTEM01.DBF
C:\APP\XFF\PRODUCT\11.2.0.3\DBHOME_1\DATABASE\MISSING00002
C:\APP\XFF\PRODUCT\11.2.0.3\DBHOME_1\DATABASE\MISSING00003
C:\APP\XFF\PRODUCT\11.2.0.3\DBHOME_1\DATABASE\MISSING00004
C:\APP\XFF\PRODUCT\11.2.0.3\DBHOME_1\DATABASE\MISSING00005
………………
C:\APP\XFF\PRODUCT\11.2.0.3\DBHOME_1\DATABASE\MISSING00121
C:\APP\XFF\PRODUCT\11.2.0.3\DBHOME_1\DATABASE\MISSING00122

已选择122行。

恢复完成

发表在 Oracle备份恢复 | 标签为 | 评论关闭