月归档:八月 2015

在win 64位平台上运行bbed(支持ORACLE 10g 11g 12c)

很多朋友反馈在win 64位操作系统之上无法使用bbed(包括9i,10g,11g,12c数据库版本),以前写过一篇文章,完美实现了在win平台的各个版本的数据库版本之上实现使用bbed(在win中运行bbed程序),可惜很遗憾没有注明平台信息,留下了不少疑问,今天在自己的电脑上再次实现此功能,用来证明win 64位的平台之上也可以运行bbed程序(数据库版本包括10g,11g,12c,在10g之前x86架构中无win 64位版本数据库,因此我也无能为力).

操作系统版本64位
本机测试为win 7 64位操作系统
win-64


win-64-2


数据库版本64位
本机测试数据库版本为12.1.0.2 64位版本(因为12c都支持,那对于10g/11g更是不在话下)
db-64


bbed运行情况
这里的bbed只是运行起来,并未加载数据文件,因此这里看到的FILENAME为空,但是不妨碍证明bbed可以在win平台,64位的数据库中运行
bbed-win64


整体证明win 64位平台,64位数据库运行bbed
一图抵上千言万语,让我们使用一幅完整截图来说明,bbed是可以运行在win 64位平台的64位版本的数据库之上(而且这里使用了目前最新的12.1.0.2版本)
oracle-64-bbed-10g-11g-12c

发表在 非常规恢复 | 标签为 , , , , , | 2 条评论

记录一次存储异常数据库恢复后遗症ORA-600[kafspa:columnBuffer1]错误处理

先说下前提,这个库以前是由于存储异常,找硬件厂商做了raid重组,然后我进行数据恢复的,恢复出来数据之后,应用厂商通过验证和补数据,然后迁移到另外一台机器做生产用的,这个库一直没有怎么看,最近检查数据库发现ORA-600[kafspa:columnBuffer1]错误,通过删除异常记录的方式解决.
数据库alert日志

Mon Aug 10 00:00:21 2015
LNS: Standby redo logfile selected for thread 1 sequence 617 for destination LOG_ARCHIVE_DEST_2
Mon Aug 10 00:00:33 2015
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j002_6900.trc  (incident=146517):
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_146517\xff_j002_6900_i146517.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j002_6900.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_280"
ORA-20011: Approximate NDV failed: 
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
ORA-06512: at "SYS.DBMS_STATS", line 31228

分析日志发现

*** 2015-07-19 06:00:30.231
*** SESSION ID:(578.751) 2015-07-19 06:00:30.231
*** CLIENT ID:() 2015-07-19 06:00:30.231
*** SERVICE NAME:(SYS$USERS) 2015-07-19 06:00:30.231
*** MODULE NAME:(DBMS_SCHEDULER) 2015-07-19 06:00:30.231
*** ACTION NAME:(ORA$AT_OS_OPT_SY_220) 2015-07-19 06:00:30.231
 
Dump continued from file: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j001_4444.trc
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []

========= Dump for incident 146142 (ORA 600 [kafspa:columnBuffer1]) ========

*** 2015-07-19 06:00:30.231
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=g0q33k8qtbcpd) -----
/* SQL Analyze(1) */ select /*+  full(t)    no_parallel(t) no_parallel_index(t)
 dbms_stats cursor_sharing_exact use_weak_name_resl 
dynamic_sampling(0) no_monitoring no_substrb_pad 
…………
to_char(substrb(dump(max("LIST_NO"),16,0,32),1,120)) from "CHF"."T_XIFENFEI" t 
…………

对表进行收集统计信息


SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE)
;
BEGIN DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE); EN
D;

*
第 1 行出现错误:
ORA-20011: Approximate NDV failed: ORA-00600: 内部错误代码, 参数:
[kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
ORA-06512: 在 "SYS.DBMS_STATS", line 24232
ORA-06512: 在 "SYS.DBMS_STATS", line 24332
ORA-06512: 在 line 1


SQL> desc "CHF"."T_XIFENFEI"
 名称                                      是否为空? 类型
 ----------------------------------------- -------- -----------------

 VISIT_DATE                                         DATE
…………
 GETDRUG_FLAG                                       VARCHAR2(2)
…………

通过上面的alert日志和trace文件以及人工收集统计信息,基本上可以定位是由于数据库自动收集统计信息进程在进行统计信息收集之时,对于”CHF”.”T_XIFENFEI”表进行收集统计信息由于某种错误,从而出现该错误.查询mos,发现此类问题主要是由于varchar2类型存储的数据长度超过了表定义长度.

通过验证官方所说

C:\Users\Administrator>exp "'/ as sysdba'" tables="CHF"."T_XIFENFEI" file
=y:/1.dmp log=y:/1.log

Export: Release 11.2.0.4.0 - Production on 星期四 8月 13 11:03:22 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Produc
tion
With the Partitioning, OLAP, Data Mining and Real Application Testing options
已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集

即将导出指定的表通过常规路径...
当前的用户已更改为 CHF
. . 正在导出表                    T_XIFENFEI
EXP-00015: 错误出现在行 1339552 (表 T_XIFENFEI, 列 GETDRUG_FLAG), 数据类型 1
EXP-00001: 数据字段截断 - 列长度 =2, 缓冲区大小 =2 实际大小 =17Errors in file :
OCI-21500: 内部错误代码, 参数: [kghfrempty:ds], [0x00652FCC8], [], [], [], [], [
], []


----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
kgerinv_internal()+  CALL???  skgudmp()            000000000 006447680 000000000

139                                                006447680
kgerinv()+49         CALL???  kgerinv_internal()   000000001 000676B4D 0064985B0

                                                   000000000
kgerin()+49          CALL???  kgerinv()            000000018 000799612 000072000

                                                   000000000
kghnerror()+294      CALL???  kgerin()             006447680 00645092C 006447680

                                                   000000001
kghfrempty()+639     CALL???  kghnerror()          0000001F0 000000000
                                                   BE019800000000 7E01960000
kghgex()+1433        CALL???  kghfrempty()+368     000000000 00652CAD8 000000000

                                                   000000000
kghfnd()+808         CALL???  kghgex()             001004000 000000000 001BEDD10

                                                   001A7131C
kghalo()+610         CALL???  kghfnd()             00012C450 00012C4A0 000000000

                                                   006446FD0
kghgex()+445         CALL???  kghalo()             006494848 000000000 001BEDD10

                                                   00190A575
kghfnd()+808         CALL???  kghgex()             000000001 0000001A0 000000000

                                                   006493D68
kghalo()+610         CALL???  kghfnd()             000000000 006447680 0FFFFFFFF

                                                   006447680
kpuhhalo()+358       CALL???  kghalo()             000000000 000000178 07FFFFFFF

                                                   000000001
kpuertb_reallocTemp  CALL???  kpuhhalo()           00652C498 000003E84 001C0EA44

Buf()+192                                          000000000
kpuex_reallocTempBu  CALL???  kpuertb_reallocTemp  000004007 0018BA3BF 00012CAB0

f()+67                        Buf()                001AB296F
kpudefn()+347        CALL???  kpuex_reallocTempBu  00012CC38 001004000 001BEDD44

                              f()                  000000004
kpudfn()+1506        CALL???  kpudefn()            00012F3D0 000000004 006520044

                                                   000000000
OCIDefineByPos()+10  CALL???  kpudfn()             004327570 000000000 00012F3D0

2                                                  000000004
00000001400116E5     CALL???  OCIDefineByPos()     1043B9300 0043B92C0 0044002B8

                                                   004401394
000000014004AFC7     CALL???  00000001400113BA     00012F380 00012F0E0 000000068

                                                   14004B2B6
000000014001E784     CALL???  000000014004A37E     000013F30 140095A71 140097520

                                                   14009F540
00000001400027A7     CALL???  000000014001E39F     14009F838 00012FB5C 140097520

                                                   14009F540
000000014000102C     CALL???  0000000140001E2C     000000005 004327570
                                                   1D0D5749D21764D 000000000
000000014006BEF0     CALL???  000000014000100E     000130000 1AFBFE2D0D8
                                                   000000000 000000000
000000007748652D     CALL???  000000014006BDD0     000000000 000000000 000000000

                                                   000000000
00000000775BC521     CALL???  0000000077486520     000000000 000000000 000000000

                                                   000000000


call stack performance statistics:
total                  : 0.778000 sec
setup                  : 0.350000 sec
stack unwind           : 0.099000 sec
symbol translation     : 0.021000 sec
printing the call stack: 0.304000 sec
printing frame data    : 0.000000 sec
printing argument data : 0.000000 sec


----- End of Call Stack Trace -----

这里通过exp验证到数据在GETDRUG_FLAG列上有异常,本来定义列长度为2,可是实际数据长度为17,明显不符

通过plsql定位具体错误rowid

SQL> set serveroutput on
SQL> DECLARE
  2   TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  3   CURSOR c1 IS  select /*+index(t PK_T_XIFENFEI_BAK_NEW)*/ rowid from CHF.T_XIFENFEI t;
  4   r RowIDTab;
  5   rows  NATURAL := 20000;
  6   bad_rows number := 0 ;
  7   errors number;
  8   error_code number;
  9   myrowid rowid;
 10  BEGIN
 11   OPEN c1;
 12   LOOP
 13     FETCH  c1 BULK COLLECT INTO r LIMIT rows;
 14     EXIT WHEN r.count=0;
 15     BEGIN
 16      FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
 17       insert into CHF.T_XIFENFEI_new
 18       select /*+ ROWID(A) */ *
 19       from CHF.T_XIFENFEI A where rowid = r(i);
 20     EXCEPTION
 21     when OTHERS then
 22      BEGIN
 23       errors := SQL%BULK_EXCEPTIONS.COUNT;
 24       FOR err1 IN 1..errors LOOP
 25           error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
 26           myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
 27           bad_rows := bad_rows + 1;
 28           insert into system.had_rows values('CHF.T_XIFENFEI',myrowid, error_code);
 29       END LOOP;
 30       END;
 31     END;
 32    commit;
 33   END LOOP;
 34   commit;
 35   CLOSE c1;
 36   dbms_output.put_line('Total Bad Rows: '||bad_rows);
 37  END;
 38  /
Total Bad Rows: 1

PL/SQL 过程已成功完成。


SQL> SELECT row_id FROM  system.had_rows ;

 ROW_ID
 ------------------

 AAAT8wAAEAAAM29AAX

SQL> select * from  CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX';
select * from  CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX'
                        *
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kafspa:columnBuffer1], [2883], [1], [], [], [],
[], [], [], [], [], []

通过这里基本上可以定位到是该条rowid记录有问题,和业务进行沟通,确定该条记录可以删除(也不能访问,其实不删除也没用)

删除异常记录

SQL> delete from CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX';

已删除 1 行。

SQL> commit;

提交完成。

收集统计信息

SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE)
;

PL/SQL 过程已成功完成。

通过清理异常记录,数据库可以正常收集统计信息,未再报ORA-00600[kafspa:columnBuffer1]错误,故障较完美解决

补充几个现象
1. analyze table “CHF”.”T_XIFENFEI” estimate statistics; 分析表统计信息正常,但是dbms_stats收集报错(因为dbms_stats相当对于每个列进行了扫描,而analyze应该不是)
2. 在报ORA-00600[kafspa:columnBuffer1]的情况下,ctas依旧可以成功,但是普通插入不行(因为ctas相当加油append操作),因此在有些情况下,需要慎重append(特别是有逻辑坏块的时候)

发表在 ORA-xxxxx | 标签为 , | 评论关闭

aix平台 ORA-01115 ORA-01110 ORA-27067 故障恢复

接到朋友恢复请求,aix 5.3,Oracle 10.2.0.1平台,数据库启动报ORA-01115 ORA-01110 ORA-27067错误,数据库无法正常打开,通过分析,是由于10201在aix上面的bug导致,通过技巧规避,完美解决给问题,数据0丢失
数据库报错alert日志

Mon Aug 10 13:25:22 2015
ALTER DATABASE   MOUNT
Mon Aug 10 13:25:29 2015
Setting recovery target incarnation to 1
Mon Aug 10 13:25:29 2015
Successful mount of redo thread 1, with mount id 432339141
Mon Aug 10 13:25:29 2015
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Mon Aug 10 13:25:36 2015
alter database open
Mon Aug 10 13:25:36 2015
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Mon Aug 10 13:25:37 2015
Started redo scan
Mon Aug 10 13:25:52 2015
Completed redo scan
 7889582 redo blocks read, 75305 data blocks need recovery
Mon Aug 10 13:25:53 2015
Errors in file /dc/admin/datacent/bdump/datacent_p002_144124.trc:
ORA-01115: IO error reading block from file 2 (block # 40704)
ORA-01110: data file 2: '/dc/oradata/datacent/undotbs01.dbf'
ORA-27067: size of I/O buffer is invalid
Additional information: 2
Additional information: 1572864
Mon Aug 10 13:25:53 2015
Aborting crash recovery due to slave death, attempting serial crash recovery
Mon Aug 10 13:25:53 2015
Beginning crash recovery of 1 threads
Mon Aug 10 13:25:53 2015
Started redo scan
Mon Aug 10 13:26:09 2015
Completed redo scan
 7889582 redo blocks read, 75305 data blocks need recovery
Mon Aug 10 13:26:12 2015
Aborting crash recovery due to error 1115
Mon Aug 10 13:26:12 2015
Errors in file /dc/admin/datacent/udump/datacent_ora_123384.trc:
ORA-01115: IO error reading block from file 2 (block # 39077)
ORA-01110: data file 2: '/dc/oradata/datacent/undotbs01.dbf'
ORA-27067: size of I/O buffer is invalid
Additional information: 2
Additional information: 1310720
ORA-1115 signalled during: alter database open...

这里报的前面两个错误ORA-01115 ORA-01110我们都非常熟悉,类似数据库启动遇到坏块或者io错误之时可能就会报如此错误。但是ORA-27067确实不多见,从mos上看,很多是由于rman备份之时的bug可能导致该错误。

dbv检测undo坏块文件

DBVERIFY: Release 10.2.0.1.0 - Production on Mon Aug 10 23:18:15 2015

Copyright (c) 1982, 2003, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /dc/oradata/datacent/undotbs01.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 329600
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 327504
Total Pages Processed (Seg)  : 17
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 2096
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 1887888 (0.1887888)

这里可以看到,undo文件本身并没有逻辑和物理的坏块,证明因为数据库异常的原因,可能是由于ORA-27067: size of I/O buffer is invalid导致。根据官方文档ORA-01115 ORA-27067 DURING PARALLEL INSTANCE RECOVERY AFTER INSTANCE CRASH中的解释,我们基本上可以确定很可能是由于10.2.0.1在aix平台的jfs2系统中,由于大量事务操作,突然abort掉数据库(也可能断电),从而数据库在启动的时候进行实例恢复,而由于内部的bug,导致实例恢复无法成功。通过我们处理后的,数据库完美启动,数据0丢失

数据库启动日志

Mon Aug 10 16:34:14 2015
alter database open
Mon Aug 10 16:34:14 2015
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Mon Aug 10 16:34:14 2015
Started redo scan
Mon Aug 10 16:34:27 2015
Completed redo scan
 7889582 redo blocks read, 0 data blocks need recovery
Mon Aug 10 16:34:27 2015
Started redo application at
 Thread 1: logseq 664704, block 1286922
Mon Aug 10 16:34:27 2015
Recovery of Online Redo Log: Thread 1 Group 4 Seq 664704 Reading mem 0
  Mem# 0 errs 0: /dev/rredo04
Mon Aug 10 16:34:32 2015
Recovery of Online Redo Log: Thread 1 Group 5 Seq 664705 Reading mem 0
  Mem# 0 errs 0: /dev/rredo05
Mon Aug 10 16:34:38 2015
Recovery of Online Redo Log: Thread 1 Group 6 Seq 664706 Reading mem 0
  Mem# 0 errs 0: /dev/rredo06
Mon Aug 10 16:34:40 2015
Completed redo application
Mon Aug 10 16:34:40 2015
Completed crash recovery at
 Thread 1: logseq 664706, block 1017805, scn 8554793334
 0 data blocks read, 0 data blocks written, 7889582 redo blocks read
Mon Aug 10 16:34:40 2015
Thread 1 advanced to log sequence 664707
Thread 1 opened at log sequence 664707
  Current log# 1 seq# 664707 mem# 0: /dev/rredo01
Successful open of redo thread 1
Mon Aug 10 16:34:40 2015
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 10 16:34:40 2015
SMON: enabling cache recovery
Mon Aug 10 16:34:40 2015
Successfully onlined Undo Tablespace 1.
Mon Aug 10 16:34:40 2015
SMON: enabling tx recovery
Mon Aug 10 16:34:41 2015
Database Characterset is ZHS32GB18030
replication_dependency_tracking turned off (no async multimaster replication found)
WARNING: AQ_TM_PROCESSES is set to 0. System operation might be adversely affected.
Mon Aug 10 16:34:41 2015
SMON: Parallel transaction recovery tried
Mon Aug 10 16:34:42 2015
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Aug 10 16:34:42 2015
Completed: alter database open
发表在 ORA-xxxxx, Oracle备份恢复 | 标签为 , , , | 2 条评论