分类目录归档:Oracle

ORACLE update 操作内部原理

对于oracle的update操作,在数据块中具体是如何出来,是直接更新原来值,还是通过插入新值修改指针的方法实现.下面通过证明:
模拟表插入数据

SQL> create table t_xifenfei(id number,name varchar2(10));

Table created.

SQL> insert into t_xifenfei values(1,'XFF');

1 row created.

SQL> insert into t_xifenfei values(2,'CHF');

1 row created.

SQL> commit; 

Commit complete.

SQL> alter system checkpoint;

System altered.

SQL> select id,rowid,
  2  dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3  dbms_rowid.rowid_block_number(rowid)blockno,
  4  dbms_rowid.rowid_row_number(rowid) rowno
  5  from t_xifenfei;

        ID ROWID                 REL_FNO    BLOCKNO      ROWNO
---------- ------------------ ---------- ---------- ----------
         1 AAASc+AAEAAAACvAAA          4        175          0
         2 AAASc+AAEAAAACvAAB          4        175          1

SQL> alter system dump datafile 4 block 175;

System altered.

SQL>  select value from v$diag_info where name='Default Trace File';

VALUE
--------------------------------------------------------------------------------
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24625.trc

数据存储对应16进制值

SQL> select dump(1,'16') from dual;

DUMP(1,'16')
-----------------
Typ=2 Len=2: c1,2

SQL> select dump(2,'16') from dual;

DUMP(2,'16')
-----------------
Typ=2 Len=2: c1,3

SQL> select dump('XFF','16') FROM DUAL;

DUMP('XFF','16')
----------------------
Typ=96 Len=3: 58,46,46

SQL> SELECT DUMP('CHF','16') FROM DUAL;

DUMP('CHF','16')
----------------------
Typ=96 Len=3: 43,48,46

得出第一条记录对应值为:02c10203584646;第二条记录对应值为:02c10303434846

dump 数据块得到记录

bdba: 0x010000af
data_block_dump,data header at 0xb683c064
===============
tsiz: 0x1f98
hsiz: 0x16
pbl: 0xb683c064
     76543210
flag=--------
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f84
avsp=0x1f6e
tosp=0x1f6e
0xe:pti[0]      nrow=2  offs=0
0x12:pri[0]     offs=0x1f8e    ---->8078
0x14:pri[1]     offs=0x1f84    ---->8068
block_row_dump:
tab 0, row 0, @0x1f8e
tl: 10 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [ 3]  58 46 46
tab 0, row 1, @0x1f84
tl: 10 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 03
col  1: [ 3]  43 48 46
end_of_block_dump
End dump data blocks tsn: 4 file#: 4 minblk 175 maxblk 175

bbed查看相关记录

BBED> p kdbr
sb2 kdbr[0]                                 @118      8078 <--第一条row directory指针位置
sb2 kdbr[1]                                 @120      8068 <--第二条row directory指针位置

BBED> p *kdbr[0] 
rowdata[10]
-----------
ub1 rowdata[10]                             @8178     0x2c

BBED> x /rnc  
rowdata[10]                                 @8178    
-----------
flag@8178: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8179: 0x01
cols@8180:    2

col    0[2] @8181: 1 
col    1[3] @8184: XFF


BBED> p *kdbr[1]
rowdata[0]
----------
ub1 rowdata[0]                              @8168     0x2c

BBED> x /rnc
rowdata[0]                                  @8168    
----------
flag@8168: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8169: 0x01
cols@8170:    2

col    0[2] @8171: 2 
col    1[3] @8174: CHF


BBED> d
 File: /u01/oracle/oradata/ora11g/users01.dbf (4)
 Block: 175              Offsets: 8168 to 8191           Dba:0x010000af
------------------------------------------------------------------------
 2c010202 c1030343 48462c01 0202c102 03584646 010650e5 

 <32 bytes per line>

这里可以得到结论如下:
1.数据是从块的底部开始往上存储
2.在每一条记录的头部分别有flag/lock/cols对应这里的2c0102
3.这里的偏移量和dump出来的数据可以看出来两条记录是连续在一起(偏移量分别为:8168和8178)

更新一条记录

SQL> update t_xifenfei set name='XIFENFEI' where id=1;

1 row updated.

SQL> commit;

Commit complete.

SQL> alter system checkpoint;

System altered.

SQL> alter system dump datafile 4 block 175;

System altered.

SQL> select dump('XIFENFEI','16') from dual;

DUMP('XIFENFEI','16')
-------------------------------------
Typ=96 Len=8: 58,49,46,45,4e,46,45,49

我们可以但看到值有XFF改变为XIFENFEI,存储长度变大

dump数据块信息

bdba: 0x010000af
data_block_dump,data header at 0xb683c064
===============
tsiz: 0x1f98
hsiz: 0x16
pbl: 0xb683c064
     76543210
flag=--------
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f75
avsp=0x1f69
tosp=0x1f69
0xe:pti[0]      nrow=2  offs=0
0x12:pri[0]     offs=0x1f75    ---->8053
0x14:pri[1]     offs=0x1f84    ---->8068
block_row_dump:
tab 0, row 0, @0x1f75
tl: 15 fb: --H-FL-- lb: 0x2  cc: 2
col  0: [ 2]  c1 02
col  1: [ 8]  58 49 46 45 4e 46 45 49
tab 0, row 1, @0x1f84
tl: 10 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 03
col  1: [ 3]  43 48 46
end_of_block_dump
End dump data blocks tsn: 4 file#: 4 minblk 175 maxblk 175

通过对比第一次dump出来的数据块发现:row 0的值和偏移量发生了变化

bbed查看相关记录

BBED> set file 4 block 175
        FILE#           4
        BLOCK#          175

BBED> map
 File: /u01/oracle/oradata/ora11g/users01.dbf (4)
 Block: 175                                   Dba:0x010000af
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 72 bytes                     @20      

 struct kdbh, 14 bytes                      @100     

 struct kdbt[1], 4 bytes                    @114     

 sb2 kdbr[2]                                @118     

 ub1 freespace[8031]                        @122     

 ub1 rowdata[35]                            @8153    

 ub4 tailchk                                @8188    

BBED> p kdbr
sb2 kdbr[0]                                 @118      8053   <--第一条row directory指针位置
sb2 kdbr[1]                                 @120      8068   <--第二条row directory指针位置

BBED> p *kdbr[1]
rowdata[15]
-----------
ub1 rowdata[15]                             @8168     0x2c

BBED> x /rnc
rowdata[15]                                 @8168    
-----------
flag@8168: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8169: 0x00
cols@8170:    2

col    0[2] @8171: 2 
col    1[3] @8174: CHF


BBED> p *kdbr[0]
rowdata[0]
----------
ub1 rowdata[0]                              @8153     0x2c


BBED> x /r
rowdata[0]                                  @8153    
----------
flag@8153: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8154: 0x02
cols@8155:    2

col    0[2] @8156:  0xc1  0x02 
col    1[8] @8159:  0x58  0x49  0x46  0x45  0x4e  0x46  0x45  0x49 

BBED> set count 64
        COUNT           64

 <32 bytes per line>

BBED> d /v
 File: /u01/oracle/oradata/ora11g/users01.dbf (4)
 Block: 175     Offsets: 8153 to 8191  Dba:0x010000af
-------------------------------------------------------
 2c020202 c1020858 4946454e 4645492c l ,......XIFENFEI,
 000202c1 03034348 462c0002 02c10203 l ......CHF,......
 58464602 068de8                     l XFF....

 <16 bytes per line>

从这里可以看到
1.这里可以看到三个值(XFF,CHF,XIFENFEI)均存在,但是通过p kdbr和dump block不能看到,因为row directory中无指针指定到该值上
2.也是通过row directory指针使得我们从原先看到的第一条记录处于数据块最底部变成了现在相对而言的数据部分最上层,
3.绝大多数情况:数据库更新一条记录,不是直接修改数据值,而是重新插入一条新记录,然后修改row directory指针指定到新的offset上
4.不是直接update,而是insert+指针来实现,这样做的好处:1)如果修改记录update值的长度发生变化(变大或者变小)那么该值之前的数据都要发生变动,对数据库来说成本太高.2)如果直接更新值可能导致其他数据变动,使得其他行受到影响.
5.由于是修改row directory指针,所以该处理方法的rowid值不会发生变化

发表在 Oracle | 标签为 | 一条评论

使用oradebug修改数据库scn

闲着无事看到几篇文章介绍了使用oradebug修改数据库scn的案例,这里也做了两个测试,发现该功能确实很巧妙,通过修改内存中的scn值,然后写入控制文件和数据文件,实现修改scn的方法,不过同样该方法的危害性极大,这里仅供测试使用,生产环境切不可乱使用,可能引起很严重后果
数据库版本信息

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select '惜分飞' XIFENFEI FROM DUAL;

XIFENF
------
惜分飞

在open库中修改scn

SQL> oradebug setmypid
Statement processed.

--查看当前scn
SQL>  oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000000 0007A09F 00000019 00000000 00000000 00000000 00000000 20009034
SQL>  select CHECKPOINT_CHANGE# a from v$datafile;

         A
----------
    499314
    499314
    499314
    499314

SQL> select dbms_flashback.get_system_change_number a from dual;

         A
----------
    499877

SQL> select to_number('7A09F','xxxxxxxxx') from dual;

TO_NUMBER('7A09F','XXXXXXXXX')
------------------------------
                        499871

--修改内存中scn值(十进制)
SQL>  oradebug poke 0x20009228 4 8
BEFORE: [20009228, 2000922C) = 00000000
AFTER:  [20009228, 2000922C) = 00000008
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000008 0007A0D8 00000052 00000000 00000000 00000000 00000000 20009034

SQL> col a for 999999999999999
SQL> select dbms_flashback.get_system_change_number a from dual;

         A
------------
34360238301

SQL> select to_number('8','xx')*4294967296+to_number('0007A0D8','xxxxxxxx') a from dual;

               A
----------------
     34360238296

--做一个checkpoint为了内存中的scn值写入控制文件和数据文件
SQL>  alter system checkpoint;

System altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size              96471516 bytes
Database Buffers          213909504 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.

SQL> col a for 999999999999999
SQL> select CHECKPOINT_CHANGE# a from v$datafile;

               A
----------------
     34360238496
     34360238496
     34360238496
     34360238496

SQL> select CHECKPOINT_CHANGE# a from v$datafile_header;

               A
----------------
     34360238496
     34360238496
     34360238496
     34360238496

在mount库中修改scn

SQL> startup mount
ORACLE instance started.

Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size              96471516 bytes
Database Buffers          213909504 bytes
Redo Buffers                7118848 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
--因为数据库是mount状态不能看到scn值
SQL>  oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 20009034
SQL>  col a for 999999999999999
SQL> select CHECKPOINT_CHANGE# a from v$datafile_header;

               A
----------------
     34360240739
     34360240739
     34360240739
     34360240739

--求出WRAP SCN值
SQL> select 34360240739/4294967296 from dual;

34360240739/4294967296
----------------------
            8.00011697

--修改内存中scn值(十六进制)
SQL> oradebug poke 0x20009228 4 0x0000000a
BEFORE: [20009228, 2000922C) = 00000000
AFTER:  [20009228, 2000922C) = 0000000A
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 0000000A 00000000 00000000 00000000 00000000 00000000 00000000 20009034

SQL> alter database open;

Database altered.

SQL> select dbms_flashback.get_system_change_number a from dual
  2  ;

               A
----------------
     42949673074

--注意:使用此种方法修改BASE SCN如果不指定,会从0开始计数
SQL>  oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 0000000A 00000077 0000001C 00000000 00000000 00000000 00000000 20009034
SQL> select to_number('A','xx')*4294967296+to_number('00000077','xxxxxxxx') a from dual;

               A
----------------
     42949673079

SQL> alter system checkpoint;

System altered.

SQL> select CHECKPOINT_CHANGE# a from v$datafile_header;

               A
----------------
     42949673095
     42949673095
     42949673095
     42949673095

SQL> select CHECKPOINT_CHANGE# a from v$datafile;

               A
----------------
     42949673095
     42949673095
     42949673095
     42949673095

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size              96471516 bytes
Database Buffers          213909504 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.
SQL> select CHECKPOINT_CHANGE# a from v$datafile_header;

               A
----------------
     42949673231
     42949673231
     42949673231
     42949673231

在oradebug推进scn的过程中,需要注意不同平台,不同位数的ORACLE数据库可能推进方式有一定的区别,操作前最好在系统平台位数上进行测试,否则有可能导致恢复后果更加麻烦

发表在 非常规恢复 | 标签为 , | 3 条评论

DATAGUARD中MAXIMUM AVAILABILITY+LGWR SYNC导致主库不能启动

qq群里面的一朋友的的DG因为备库已经下架,主库重启的时候不能正常启动,帮忙处理结果如下
版本相关信息

Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
ORACLE_HOME = /export/home/oracle/product/9.2.0
System name:    SunOS
Node name:      eTermSrv2
Release:        5.10
Version:        Generic
Machine:        sun4u
Instance name: abcd

数据库不能启动日志

Fri Aug 10 12:37:56 2012
ALTER DATABASE OPEN
Fri Aug 10 12:37:56 2012
LGWR: Primary database is in CLUSTER CONSISTENT mode
LGWR: Primary database is in MAXIMUM AVAILABILITY mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
LNS0 started with pid=16
Fri Aug 10 12:37:59 2012
ORA-1013 signalled during: ALTER DATABASE OPEN...
Fri Aug 10 12:41:45 2012
LGWR: Error 12535 verifying archivelog destination LOG_ARCHIVE_DEST_2
LGWR: Continuing...
Fri Aug 10 12:41:45 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-12535: TNS:operation timed out
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
Creating archive destination LOG_ARCHIVE_DEST_2: 'ora9i'
LGWR: Error 12535 creating archivelog file 'ora9i'
Fri Aug 10 12:45:32 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-12535: TNS:operation timed out
LGWR: Completed archiving  log 1 thread 1 sequence 6808
Thread 1 advanced to log sequence 6808
Fri Aug 10 12:45:32 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/export/home/oracle/oradata/abcd/system01.dbf'
LGWR: terminating instance due to error 1157
Instance terminated by LGWR, pid = 11504

通过这里大概看出数据库原dg配置是MAXIMUM AVAILABILITY,因为备机下架,导致ora9i的tns不能访问,从而出现一些列错误,其中使得lgwr异常,因为oracle的某种内部机制,导致dbwr不能访问数据文件(这里体现出来是system01.dbf不能访问,但实际上应该是所有所有数据文件均不能访问,因为system01.dbf位于第一,所以报出该错误.)

*** SESSION ID:(3.1) 2012-08-10 12:37:56.847
Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
*** 2012-08-10 12:41:45.614
Error 12535 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
*** 2012-08-10 12:41:45.615
LGWR: Error 12535 verifying archivelog destination LOG_ARCHIVE_DEST_2
Continuing...
ORA-12535: TNS:operation timed out
*** 2012-08-10 12:45:32.514
Error 12535 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
Error 12535 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
*** 2012-08-10 12:45:32.514
LGWR: Error 12535 creating archivelog file 'ora9i'
*** 2012-08-10 12:45:32.514
kcrrfail: dest:2 err:12535 force:0
ORA-12535: TNS:operation timed out
error 1157 detected in background process
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/export/home/oracle/oradata/abcd/system01.dbf'

通过trace文件,更加清楚的说明,可能是因为lgwr异常导致dbwr访问数据文件出现问题.

问题分析/解决汇总

SQL> show parameter log_archive_dest_state_1;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_1             string      enable

SQL> show parameter log_archive_dest_state_2;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2             string      enable

SQL> show parameter log_archive_dest_1;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_1                   string      location=/export/home/oracle/o
                                                 radata/abcd/archive

SQL> show parameter log_archive_dest_2;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2                   string      SERVICE=ora9i LGWR SYNC AFFIRM

SQL>select  protection_mode,database_role from v$database;

PROTECTION_MODE      DATABASE_ROLE
-------------------- ----------------
MAXIMUM AVAILABILITY PRIMARY

SQL>   show parameter succ;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_min_succeed_dest         integer     1

通过上面的sql查询结果和alert日志与trace文件结合分析,大体结论是:
数据库的dg的保护模式为:MAXIMUM AVAILABILITY
数据库的日志传输方式是:LGWR SYNC AFFIRM
现在因为备机下架,主库LGWR不能通过tns访问备库,从而导致数据库的dbwr进程访问异常,是的数据库不能正常被open
这里的故障有一些巧合:MAXIMUM AVAILABILITY+LGWR SYNC AFFIRM+9.2.0.1+SunOS
我不清楚是不是ORACLE的bug导致,但是这个问题解决起来比较简单,只需要修改log_archive_dest_state_2=defer,使得log_archive_dest_2参数不生效,让lgwr不再访问备机

发表在 Data Guard | 2 条评论