标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-01595 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,764)
- DB2 (22)
- MySQL (77)
- Oracle (1,605)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (166)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (69)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (588)
- Oracle安装升级 (97)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (86)
- PostgreSQL (30)
- pdu工具 (6)
- PostgreSQL恢复 (9)
- SQL Server (32)
- SQL Server恢复 (13)
- TimesTen (7)
- 达梦数据库 (3)
- 达梦恢复 (1)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (39)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (22)
-
最近发表
- 文件系统格式化MySQL数据库恢复
- .sstop勒索加密数据库恢复
- 解决一次硬件恢复之后数据文件0kb的故障恢复case
- Error in invoking target ‘libasmclntsh19.ohso libasmperl19.ohso client_sharedlib’问题处理
- ORA-01171: datafile N going offline due to error advancing checkpoint
- linux环境oracle数据库被文件系统勒索加密为.babyk扩展名溯源
- ORA-600 ksvworkmsgalloc: bad reaper
- ORA-600 krccfl_chunk故障处理
- Oracle Recovery Tools恢复案例总结—202505
- ORA-600 kddummy_blkchk 数据库循环重启
- 记录一次asm disk加入到vg通过恢复直接open库的案例
- CHECKDB 发现了 N 个分配错误和 M 个一致性错误
- 达梦数据库dm.ctl文件异常恢复
- Oracle Recovery Tools修复ORA-00742、ORA-600 ktbair2: illegal inheritance故障
- 可能是 tempdb 空间用尽或某个系统表不一致故障处理
- 11.2.0.4库中遇到ORA-600 kcratr_nab_less_than_odr报错
- [MY-013183] [InnoDB] Assertion failure故障处理
- Oracle 19c 202504补丁(RUs+OJVM)-19.27
- Oracle Recovery Tools修复ORA-600 6101/kdxlin:psno out of range故障
- pdu完美支持金仓数据库恢复(KingbaseES)
年归档:2022
PostgreSQL恢复系列:pg_filedump基本使用
当PostgreSQL遇到重大故障,使用各种方法都无法直接启动数据库,可以考虑使用类似oracle dul工具,直接离线方式读取文件进行恢复.这个工具为pg_filedump
pg_filedump安装
[root@xifenfei ~]# yum install pg_filedump_14.x86_64 Loaded plugins: langpacks, ulninfo Resolving Dependencies --> Running transaction check ---> Package pg_filedump_14.x86_64 0:14.1-1.rhel7 will be installed --> Finished Dependency Resolution Dependencies Resolved ====================================================================================================================== Package Arch Version Repository Size ====================================================================================================================== Installing: pg_filedump_14 x86_64 14.1-1.rhel7 pgdg14 43 k Transaction Summary ====================================================================================================================== Install 1 Package Total download size: 43 k Installed size: 81 k Is this ok [y/d/N]: y Downloading packages: pg_filedump_14-14.1-1.rhel7.x86_64.rpm | 43 kB 00:00:02 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : pg_filedump_14-14.1-1.rhel7.x86_64 1/1 Verifying : pg_filedump_14-14.1-1.rhel7.x86_64 1/1 Installed: pg_filedump_14.x86_64 0:14.1-1.rhel7 Complete! -bash-4.2$ pg_filedump Version 14.1 (for PostgreSQL 8.x .. 14.x) Copyright (c) 2002-2010 Red Hat, Inc. Copyright (c) 2011-2022, PostgreSQL Global Development Group Usage: pg_filedump [-abcdfhikxy] [-R startblock [endblock]] [-D attrlist] [-S blocksize] [-s segsize] [-n segnumber] file Display formatted contents of a PostgreSQL heap/index/control file Defaults are: relative addressing, range of the entire file, block size as listed on block 0 in the file The following options are valid for heap and index files: -a Display absolute addresses when formatting (Block header information is always block relative) -b Display binary block images within a range (Option will turn off all formatting options) -d Display formatted block content dump (Option will turn off all other formatting options) -D Decode tuples using given comma separated list of types Supported types: bigint bigserial bool char charN date float float4 float8 int json macaddr name numeric oid real serial smallint smallserial text time timestamp timestamptz timetz uuid varchar varcharN xid xml ~ ignores all attributes left in a tuple -f Display formatted block content dump along with interpretation -h Display this information -i Display interpreted item details -k Verify block checksums -o Do not dump old values. -R Display specific block ranges within the file (Blocks are indexed from 0) [startblock]: block to start at [endblock]: block to end at A startblock without an endblock will format the single block -s Force segment size to [segsize] -t Dump TOAST files -v Ouput additional information about TOAST relations -n Force segment number to [segnumber] -S Force block size to [blocksize] -x Force interpreted formatting of block items as index items -y Force interpreted formatting of block items as heap items The following options are valid for control files: -c Interpret the file listed as a control file -f Display formatted content dump along with interpretation -S Force block size to [blocksize] Additional functions: -m Interpret file as pg_filenode.map file and print contents (all other options will be ignored) Report bugs to <pgsql-bugs@postgresql.org>
创建测试表
-bash-4.2$ psql psql (14.3) Type "help" for help. postgres=# create table pg_xifenfei(id int,name varchar(100)); CREATE TABLE postgres=# insert into pg_xifenfei values(1,'www.xifenfei.com'); INSERT 0 1 postgres=# insert into pg_xifenfei values(2,'xienfei_pg_recovery'); INSERT 0 1 postgres=# select * from pg_xifenfei; id | name ----+--------------------- 1 | www.xifenfei.com 2 | xienfei_pg_recovery (2 rows) postgres=#
pg_filedump恢复数据
-bash-4.2$ pg_filedump /var/lib/pgsql/14/data/base/14487/16384 ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility * * File: /var/lib/pgsql/14/data/base/14487/16384 * Options used: None ******************************************************************* Block 0 ******************************************************** <Header> ----- Block Offset: 0x00000000 Offsets: Lower 32 (0x0020) Block: Size 8192 Version 4 Upper 8096 (0x1fa0) LSN: logid 0 recoff 0x16299cf0 Special 8192 (0x2000) Items: 2 Free Space: 8064 Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 () Length (including item array): 32 <Data> ----- Item 1 -- Length: 45 Offset: 8144 (0x1fd0) Flags: NORMAL Item 2 -- Length: 48 Offset: 8096 (0x1fa0) Flags: NORMAL *** End of File Encountered. Last Block Read: 0 *** -bash-4.2$ pg_filedump -D int,charn /var/lib/pgsql/14/data/base/14487/16384|grep COPY COPY: 1 www.xifenfei.com COPY: 2 xienfei_pg_recovery -bash-4.2$ pg_filedump -D int,charn /var/lib/pgsql/14/data/base/14487/16384|grep COPY > |awk '{$1=null;print $0}'>/tmp/pg_xifenfei_rec -bash-4.2$ sed -i 's/^[ ]*//g' /tmp/pg_xifenfei_rec
导入数据验证
postgres=# truncate table pg_xifenfei; TRUNCATE TABLE postgres=# select * from pg_xifenfei; id | name ----+------ (0 rows) postgres=# copy pg_xifenfei from '/tmp/pg_xifenfei_rec'(DELIMITER ' '); COPY 2 postgres=# select * from pg_xifenfei; id | name ----+--------------------- 1 | www.xifenfei.com 2 | xienfei_pg_recovery (2 rows)
通过上述简单测试证明,在PG数据库出现极端情况下,可以使用该方法进行最后的数据恢复,减少因为数据丢失带来的损失.
Oracle断电故障处理
异常断电导致数据库异常恢复文件报ORA-00283 ORA-00742 ORA-00312
D:\check_db>sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on 星期二 5月 31 00:38:42 2022 Copyright (c) 1982, 2013, Oracle. All rights reserved. 连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> recover datafile 1; ORA-00283: 恢复会话因错误而取消 ORA-00742: 日志读取在线程 %d 序列 %d 块 %d 中检测到写入丢失情况 ORA-00312: 联机日志 3 线程 1: 'D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_3_HJ32KJD5_.LOG'
这个错误比较明显是由于异常断电引起的写丢失导致.而且这种故障在没有备份的情况下,没有什么好处理方法,只能屏蔽一致性强制拉库,尝试强制拉库报错如下
SQL> startup mount pfile='d:/pfile.txt' ORACLE 例程已经启动。 Total System Global Area 2.0310E+10 bytes Fixed Size 2290000 bytes Variable Size 3690991280 bytes Database Buffers 1.6576E+10 bytes Redo Buffers 40837120 bytes 数据库装载完毕。 SQL> recover database until cancel; ORA-00279: 更改 18755939194213 (在 生成) 对于线程 1 是必需的 指定日志: {<RET>=suggested | filename | AUTO | CANCEL} D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_3_HJ32KJD5_.LOG ORA-00600: internal error code, arguments: [3020], [2], [78824], [8467432], [], [], [], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 2, block# 78824, file offset is 645726208 bytes) ORA-10564: tablespace SYSAUX ORA-01110: data file 2: 'D:\ORADATA\ORCL\SYSAUX01.DBF' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 80834 ORA-01112: 未启动介质恢复 SQL> alter database open resetlogs; alter database open resetlogs * 第 1 行出现错误: ORA-00600: 内部错误代码, 参数: [krsi_al_hdr_update.15], [4294967295], [], [],[], [], [], [], [], [], [], []
ORA-600 krsi_al_hdr_update.15错误,主要是由于redo异常导致无法resetlogs成功,具体参考:Alter Database Open Resetlogs returns error ORA-00600: [krsi_al_hdr_update.15], (Doc ID 2026541.1)描述,处理这个问题之后,再次resetlogs库,报ORA-600 2662错误
SQL> alter database open resetlogs; alter database open resetlogs * 第 1 行出现错误: ORA-00603: ORACLE server session terminated by fatal error ORA-00600: internal error code, arguments: [2662], [4366], [4112122046], [4366], [4112228996], [12583040], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [2662], [4366], [4112122045], [4366], [4112228996], [12583040], [], [], [], [], [], [] ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00600: internal error code, arguments: [2662], [4366], [4112122040], [4366], [4112228996], [12583040], [], [], [], [], [], [] 进程 ID: 4644 会话 ID: 1701 序列号: 3
这个问题比较简单,通过修改scn即可绕过去,之后数据库open报ORA-600 4194等错误
SQL> alter database open ; alter database open * 第 1 行出现错误: ORA-00600: 内部错误代码, 参数: [4194], [
SMON: enabling tx recovery Database Characterset is ZHS16GBK Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc (incident=322982): ORA-00600: internal error code, arguments: [4137], [10.33.3070116], [0], [0], [], [], [], [], [], [], [], [] Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_322982\orcl_smon_5112_i322982.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE replication_dependency_tracking turned off (no async multimaster replication found) LOGSTDBY: Validating controlfile with logical metadata LOGSTDBY: Validation complete Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_3340.trc (incident=323030): ORA-00600: 内部错误代码, 参数: [4194], [ Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_323030\orcl_ora_3340_i323030.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Tue May 31 09:05:04 2022 Sweep [inc][322982]: completed ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (10, 33). Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc: ORA-00600: internal error code, arguments: [4137], [10.33.3070116], [0], [0], [], [], [], [], [], [], [], [] Checker run found 1 new persistent data failures Tue May 31 09:05:05 2022 Sweep [inc][323030]: completed Sweep [inc2][322982]: completed Tue May 31 09:05:14 2022 Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc (incident=322983): ORA-00600: internal error code, arguments: [4193], [10.33.3070116], [0], [], [], [], [], [], [], [], [], [] Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_322983\orcl_smon_5112_i322983.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Tue May 31 09:05:14 2022 ORA-600 signalled during: alter database open... Block recovery stopped at EOT rba 2.61.16 Block recovery completed at rba 2.61.16, scn 4366.4112429058 Block recovery from logseq 2, block 60 to scn 18755939643393 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_2_K9BSVC11_.LOG Block recovery completed at rba 2.61.16, scn 4366.4112429058 Dumping diagnostic data in directory=[cdmp_2022053],requested by(instance=1,osid=5112(SMON)),summary=[incident=322983]. Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc: ORA-01595: error freeing extent (3) of rollback segment (1)) ORA-00600: internal error code, arguments: [4193], [10.33.3070116], [3], [], [], [], [], [], [], [], [], []
对异常undo进行处理,数据库正常open成功
SQL> shutdown immediate; ORA-00600: 内部错误代码, 参数: [4193], [ SQL> shutdown abort; ORACLE 例程已经关闭。 SQL> startup mount ORACLE 例程已经启动。 Total System Global Area 2.0310E+10 bytes Fixed Size 2290000 bytes Variable Size 3690991280 bytes Database Buffers 1.6576E+10 bytes Redo Buffers 40837120 bytes 数据库装载完毕。 SQL> alter database open; 数据库已更改。
hcheck检测有一些字典不一致,建议客户逻辑导出,然后导入到新库中
HCheck Version 07MAY18 on 31-5月 -2022 09:12:22 ---------------------------------------------- Catalog Version 11.2.0.4.0 (1102000400) db_name: ORCL Catalog Fixed Procedure Name Version Vs Release Timestamp Resul t ------------------------------ ... ---------- -- ---------- -------------- ----- - .- LobNotInObj ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- MissingOIDOnObjCol ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- SourceNotInObj ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- OversizedFiles ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- PoorDefaultStorage ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- PoorStorage ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- TabPartCountMismatch ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- OrphanedTabComPart ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- MissingSum$ ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- MissingDir$ ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- DuplicateDataobj ... 1102000400 <= *All Rel* 05/31 09:12:22 PASS .- ObjSynMissing ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- ObjSeqMissing ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedUndo ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedIndex ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedIndexPartition ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedIndexSubPartition ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedTable ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedTablePartition ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedTableSubPartition ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- MissingPartCol ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedSeg$ ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- OrphanedIndPartObj# ... 1102000400 <= *All Rel* 05/31 09:12:23 FAIL HCKE-0024: Orphaned Index Partition Obj# (no OBJ$) (Doc ID 1360935.1) ORPHAN INDPART$: OBJ#=149167 BO#=6378 - no OBJ$ row ORPHAN INDPART$: OBJ#=149168 BO#=6378 - no OBJ$ row .- DuplicateBlockUse ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- FetUet ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- Uet0Check ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- SeglessUET ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- BadInd$ ... 1102000400 <= *All Rel* 05/31 09:12:23 FAIL HCKE-0030: OBJ$ INDEX entry has no IND$ or INDPART$/INDSUBPART$ entry (Doc ID 13 60528.1) OBJ$ INDEX PARTITION has no INDPART$ entry: Obj#=148278 SYS Name=WRH$_FILESTATXS _PK PARTITION=WRH$_FILEST_1572571104_16462 OBJ$ INDEX PARTITION has no INDPART$ entry: Obj#=148920 SYS Name=WRH$_FILESTATXS _PK PARTITION=WRH$_FILEST_1572571104_16678 .- BadTab$ ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- BadIcolDepCnt ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- ObjIndDobj ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- TrgAfterUpgrade ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- ObjType0 ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- BadOwner ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- StmtAuditOnCommit ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- BadPublicObjects ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- BadSegFreelist ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- BadDepends ... 1102000400 <= *All Rel* 05/31 09:12:23 WARN HCKW-0016: Dependency$ p_timestamp mismatch for VALID objects (Doc ID 1361045.1) [E] - P_OBJ#=6376 D_OBJ#=6765 .- CheckDual ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- ObjectNames ... 1102000400 <= *All Rel* 05/31 09:12:23 PASS .- BadCboHiLo ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- ChkIotTs ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- NoSegmentIndex ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- BadNextObject ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- DroppedROTS ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- FilBlkZero ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- DbmsSchemaCopy ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- OrphanedObjError ... 1102000400 > 1102000000 05/31 09:12:24 PASS .- ObjNotLob ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- MaxControlfSeq ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- SegNotInDeferredStg ... 1102000400 > 1102000000 05/31 09:12:24 PASS .- SystemNotRfile1 ... 1102000400 > 902000000 05/31 09:12:24 PASS .- DictOwnNonDefaultSYSTEM ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- OrphanTrigger ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS .- ObjNotTrigger ... 1102000400 <= *All Rel* 05/31 09:12:24 PASS --------------------------------------- 31-5月 -2022 09:12:24 Elapsed: 2 secs --------------------------------------- Found 4 potential problem(s) and 1 warning(s) Contact Oracle Support with the output and trace file to check if the above needs attention or not PL/SQL 过程已成功完成。
发表在 Oracle备份恢复
标签为 krsi_al_hdr_update.15, ORA-00742, ORA-600 2662, ORA-600 4194, oracle异常恢复, oracle断电恢复
评论关闭
PostgreSQL恢复系列:pg_control异常恢复
在PG中pg_control文件类似oracle数据库的control文件(控制文件),在Oracle中如果该文件丢失/损坏,可以通过alter database create controlfile命令进行创建,对于PG数据库来说也可以通过pg_resetwal命令来实现创建,由于pg_control文件损坏,需要人工指定一些参数完成pg_resetwal相关操作
pg_resetwal 使用说明
-bash-4.2$ pg_resetwal --help pg_resetwal resets the PostgreSQL write-ahead log. Usage: pg_resetwal [OPTION]... DATADIR Options: -c, --commit-timestamp-ids=XID,XID set oldest and newest transactions bearing commit timestamp (zero means no change) [-D, --pgdata=]DATADIR data directory -e, --epoch=XIDEPOCH set next transaction ID epoch -f, --force force update to be done -l, --next-wal-file=WALFILE set minimum starting location for new WAL -m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID -n, --dry-run no update, just show what would be done -o, --next-oid=OID set next OID -O, --multixact-offset=OFFSET set next multitransaction offset -u, --oldest-transaction-id=XID set oldest transaction ID -V, --version output version information, then exit -x, --next-transaction-id=XID set next transaction ID --wal-segsize=SIZE size of WAL segments, in megabytes -?, --help show this help, then exit Report bugs to <pgsql-bugs@lists.postgresql.org>. PostgreSQL home page: <https://www.postgresql.org/>
确认现在业务表记录情况
-bash-4.2$ psql psql (14.3) Type "help" for help. postgres=# select count(1) from ac_event; count -------- 246266 (1 row)
模拟pg_control文件异常
-bash-4.2$ ps -ef|grep postgres postgres 37178 1 0 09:58 ? 00:00:00 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data postgres 37179 37178 0 09:58 ? 00:00:00 postgres: logger postgres 37181 37178 0 09:58 ? 00:00:00 postgres: checkpointer postgres 37182 37178 0 09:58 ? 00:00:00 postgres: background writer postgres 37183 37178 0 09:58 ? 00:00:00 postgres: walwriter postgres 37184 37178 0 09:58 ? 00:00:00 postgres: autovacuum launcher postgres 37185 37178 0 09:58 ? 00:00:00 postgres: stats collector postgres 37186 37178 0 09:58 ? 00:00:00 postgres: logical replication launcher root 41368 41314 0 11:06 pts/1 00:00:00 su - postgres postgres 41369 41368 0 11:06 pts/1 00:00:00 -bash postgres 45071 41369 0 12:07 pts/1 00:00:00 ps -ef postgres 45072 41369 0 12:07 pts/1 00:00:00 grep --color=auto postgres -bash-4.2$ kill -9 37178 -bash-4.2$ ps -ef|grep postgres root 41368 41314 0 11:06 pts/1 00:00:00 su - postgres postgres 41369 41368 0 11:06 pts/1 00:00:00 -bash postgres 45095 41369 0 12:08 pts/1 00:00:00 ps -ef postgres 45096 41369 0 12:08 pts/1 00:00:00 grep --color=auto postgres -bash-4.2$ pwd /var/lib/pgsql/14/data/global -bash-4.2$ ls -l pg_control -rw-------. 1 postgres postgres 8192 May 30 12:04 pg_control -bash-4.2$ rm -rf pg_control -bash-4.2$ ls -l pg_control ls: cannot access pg_control: No such file or directory
PG启动失败
-bash-4.2$ pg_ctl start pg_ctl: another server might be running; trying to start server anyway waiting for server to start....postgres: could not find the database system Expected to find it in the directory "/var/lib/pgsql/14/data", but could not open file "/var/lib/pgsql/14/data/global/pg_control": No such file or directory stopped waiting pg_ctl: could not start server Examine the log output.
创建空pg_control文件启动依旧失败
-bash-4.2$ touch /var/lib/pgsql/14/data/global/pg_control -bash-4.2$ pg_ctl start pg_ctl: another server might be running; trying to start server anyway waiting for server to start....2022-05-30 12:09:43.953 CST [45215] PANIC: could not read file "global/pg_control": read 0 of 296 stopped waiting pg_ctl: could not start server Examine the log output.
设置next-wal-file
-l, –next-wal-file=WALFILE,这个参数设置下一个新的WAL文件的最小值,这个值可以从$PGDATA/pg_wal目录下去看最后一个WAL 文件,这个文件的id+1即可
-bash-4.2$ pwd /var/lib/pgsql/14/data/pg_wal -bash-4.2$ ls -l total 16384 -rw-------. 1 postgres postgres 16777216 May 30 12:04 000000010000000000000014 drwx------. 2 postgres postgres 6 May 24 02:20 archive_status -bash-4.2$
这个文件+1,-l 000000010000000000000015
设置next-transaction
-x, –next-transaction-id=XID,这个参数设置pg_control中的下一个XID的值,这个值可以从pg_xact目录下的文件中查询
-bash-4.2$ pwd /var/lib/pgsql/14/data/pg_xact -bash-4.2$ ls -ltr total 8 -rw-------. 1 postgres postgres 8192 May 30 12:03 0000
最后一个是0000,那么下一个XID就是0001,然后乘以 1048576 (0×100000),实际上后面直接加5个0就行了。注意,这个值是16进制的。-x 0×000100000
multixact-ids设置
-m, –multixact-ids=MXID1,MXID2,这个参数包含两个部分,MXID1和MXID2,都可以从$PGDATA/pg_multixact/offsets目录下获得。MXID1的值,首先找到最大值,+1,再乘以 65536 (0×10000,相当于后面加4个0)作为这个参数的前半部分。找到最小的值,后面加4个0,作为MXID2的值
-bash-4.2$ pwd /var/lib/pgsql/14/data/pg_multixact/offsets -bash-4.2$ ls -ltr total 8 -rw-------. 1 postgres postgres 8192 May 29 22:06 0000 -bash-4.2$
-m 0×00010000, 0×00000000(由于oldest multitransaction ID不能为0,因此后续这个值需要适当调整)
multixact-offset设置
-O, –multixact-offset=OFFSET,这个参数可以从$PGDATA/pg_multixact/members目录下获得。找到最大值,+1,乘以 52352 (0xCC80)
-bash-4.2$ pwd /var/lib/pgsql/14/data/pg_multixact/members -bash-4.2$ ls -ltr total 8 -rw-------. 1 postgres postgres 8192 May 24 02:20 0000
-O 0xCC80
尝试执行pg_resetwal
-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00010000,0x00000000 -O 0xCC80 $PGDATA pg_resetwal: error: oldest multitransaction ID (-m) must not be 0
multixact-ids值不对,进行调整后处理
postmaster.pid文件需要清理
由于PG库异常关闭,需要人工清理掉该文件
-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 $PGDATA pg_resetwal: error: lock file "postmaster.pid" exists -bash-4.2$ rm -rf postmaster.pid
pg_resetwal结果预览
-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 $PGDATA pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it Guessed pg_control values: pg_control version number: 1300 Catalog version number: 202107181 Database system identifier: 7103392535324046312 Latest checkpoint's TimeLineID: 1 Latest checkpoint's full_page_writes: off Latest checkpoint's NextXID: 0:3 Latest checkpoint's NextOID: 12000 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 3 Latest checkpoint's oldestXID's DB: 0 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 0 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float8 argument passing: by value Data page checksum version: 0 Values to be changed: First log segment after reset: 000000010000000000000015 NextMultiXactId: 131072 OldestMultiXid: 65536 OldestMulti's DB: 0 NextMultiOffset: 52352 NextXID: 1048576 OldestXID: 3 OldestXID's DB: 0 If these values seem acceptable, use -f to force reset.
pg_resetwal进行创建pg_control并启动PG
-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 -f $PGDATA pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it Write-ahead log reset -bash-4.2$ pg_ctl start waiting for server to start....2022-05-30 13:33:28.266 CST [51437] LOG: redirecting log output to logging collector process 2022-05-30 13:33:28.266 CST [51437] HINT: Future log output will appear in directory "log". done server started
验证数据
-bash-4.2$ psql psql (14.3) Type "help" for help. postgres=# select count(1) from ac_event; count -------- 245275 (1 row)
这种方法恢复之后,建议理解dump数据,然后导入到新库中