标签云
asm恢复 asm 恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (101)
- 数据库 (1,629)
- DB2 (22)
- MySQL (71)
- Oracle (1,493)
- Data Guard (50)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (158)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (13)
- ORACLE 21C (3)
- Oracle 23ai (3)
- Oracle ASM (65)
- Oracle Bug (7)
- Oracle RAC (48)
- Oracle 安全 (6)
- Oracle 开发 (27)
- Oracle 监听 (27)
- Oracle备份恢复 (545)
- Oracle安装升级 (89)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (75)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- oracle 23ai(23.5.0.24.07)完整功能版安装体验
- win平台安装oracle遭遇INS-30131处理
- 配置win环境ssh和sftp并实现免密登录
- 11.2.0.4最新psu-202407
- Oracle 19c 2024.07补丁(RUs+OJVM)
- ORA-00756 ORA-10567故障处理
- ORA-01092 ORA-00604 ORA-08103故障处理
- 数据库启动报ORA-600 6711故障分析处理
- RMAN SBT_TAPE备份无法被DISK通道识别
- ORA-27154 ORA-27300 ORA-27301 ORA-27302故障处理
- Patch SCN工具for Linux
- awr创建snapshot等待library cache: mutex X
- exadata换flash卡的一些操作
- RMAN-06207 RMAN-06208报错解决
- RAC主机相差超过10分钟导致crs无法启动
- O/S-Error: (OS 23) 数据错误(循环冗余检查)—故障处理
- 异常断电数据库恢复-从ORA-600 2131到ORA-08102: 未找到索引关键字, 对象号 39
- 数据泵迁移Wrapped PLSQL之后报PLS-00753
- 数据库open报ORA-600 kcratr_scan_lastbwr故障处理
- resetlogs强制拉库失败并使用备份system文件还原数据库故障处理
标签归档:pg_ctl: could not start server
PostgreSQL恢复系列:wal日志丢失恢复
WAL是Write Ahead Log的简写,和oracle的redo日志类似,存放在$PGDATA/pg_xlog中,10版本以后在$PGDATA/pg_wal目录.在oracle数据库中,如果redo丢失,分为active/current和inactive的redo,分别有不同的处理方式,对于oracle需要实例恢复的redo丢失,需要屏蔽数据库一致性,强制打开数据库,对于PG数据库这部分日志丢失该如何恢复,主要是通过pg_resetwal/pg_resetxlog(10以前版本)命令来实现,这里通过一个测试来验证
创建测试表并强制kill数据库
-bash-4.2$ psql psql (14.3) Type "help" for help. postgres=# create table t_xifenfei as select * from pg_database; SELECT 4 postgres=# select count(1) from t_xifenfei; count ------- 4 (1 row) postgres=# \q -bash-4.2$ ps -ef|grep post root 1819 1 0 May28 ? 00:00:00 /usr/libexec/postfix/master -w postfix 1838 1819 0 May28 ? 00:00:00 qmgr -l -t unix -u postgres 11102 1 0 05:49 ? 00:00:00 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data postgres 11103 11102 0 05:49 ? 00:00:00 postgres: logger postgres 11105 11102 0 05:49 ? 00:00:00 postgres: checkpointer postgres 11106 11102 0 05:49 ? 00:00:00 postgres: background writer postgres 11107 11102 0 05:49 ? 00:00:00 postgres: walwriter postgres 11108 11102 0 05:49 ? 00:00:00 postgres: autovacuum launcher postgres 11109 11102 0 05:49 ? 00:00:01 postgres: stats collector postgres 11110 11102 0 05:49 ? 00:00:00 postgres: logical replication launcher root 22743 22300 0 18:26 pts/3 00:00:00 su - postgres postgres 22744 22743 0 18:26 pts/3 00:00:00 -bash postgres 22937 22744 0 18:28 pts/3 00:00:00 psql postgres 22938 11102 0 18:28 ? 00:00:00 postgres: postgres postgres [local] idle postfix 32623 1819 0 21:10 ? 00:00:00 pickup -l -t unix -u root 33032 32912 0 21:15 pts/2 00:00:00 su - postgres postgres 33033 33032 0 21:15 pts/2 00:00:00 -bash postgres 35210 33033 0 21:51 pts/2 00:00:00 ps -ef postgres 35211 33033 0 21:51 pts/2 00:00:00 grep --color=auto post -bash-4.2$ kill -9 11102
删除wal日志
-bash-4.2$ pwd /var/lib/pgsql/14/data/pg_wal -bash-4.2$ ls -ltr total 311296 drwx------. 2 postgres postgres 6 May 24 02:20 archive_status -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000014 -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000015 -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000016 -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000017 -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000018 -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000019 -rw-------. 1 postgres postgres 16777216 May 28 21:29 00000001000000000000001A -rw-------. 1 postgres postgres 16777216 May 28 21:29 00000001000000000000001B -rw-------. 1 postgres postgres 16777216 May 28 21:29 00000001000000000000001C -rw-------. 1 postgres postgres 16777216 May 28 21:29 00000001000000000000001D -rw-------. 1 postgres postgres 16777216 May 28 21:29 00000001000000000000001E -rw-------. 1 postgres postgres 16777216 May 28 21:29 00000001000000000000001F -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000020 -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000021 -rw-------. 1 postgres postgres 16777216 May 28 21:29 000000010000000000000022 -rw-------. 1 postgres postgres 16777216 May 28 21:30 000000010000000000000023 -rw-------. 1 postgres postgres 16777216 May 28 21:30 000000010000000000000024 -rw-------. 1 postgres postgres 16777216 May 28 21:30 000000010000000000000025 -rw-------. 1 postgres postgres 16777216 May 29 21:51 000000010000000000000013 -bash-4.2$ rm -rf 0000000100000000000000* -bash-4.2$ ls archive_status
查询当时数据库需要的最小wal记录
-bash-4.2$ pg_controldata pg_control version number: 1300 Catalog version number: 202107181 Database system identifier: 7100998319216817119 Database cluster state: in production pg_control last modified: Sat 28 May 2022 09:36:11 PM CST Latest checkpoint location: 0/13692F80 Latest checkpoint's REDO location: 0/13692F48 Latest checkpoint's REDO WAL file: 000000010000000000000013 <===需要的记录 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:17824 Latest checkpoint's NextOID: 32769 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 727 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 17824 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Time of latest checkpoint: Sat 28 May 2022 09:31:41 PM CST
尝试启动PG
-bash-4.2$ pg_ctl start pg_ctl: another server might be running; trying to start server anyway waiting for server to start....2022-05-29 21:52:22.926 CST [35270] LOG: redirecting log output to logging collector process 2022-05-29 21:52:22.926 CST [35270] HINT: Future log output will appear in directory "log". . stopped waiting pg_ctl: could not start server Examine the log output.
启动pg失败,查看日志记录
2022-05-29 21:52:22.926 CST [35270] LOG: starting PostgreSQL 14.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit 2022-05-29 21:52:22.927 CST [35270] LOG: listening on IPv6 address "::1", port 5432 2022-05-29 21:52:22.927 CST [35270] LOG: listening on IPv4 address "127.0.0.1", port 5432 2022-05-29 21:52:22.929 CST [35270] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2022-05-29 21:52:22.931 CST [35270] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2022-05-29 21:52:22.936 CST [35272] LOG: database system was interrupted; last known up at 2022-05-28 21:36:11 CST 2022-05-29 21:52:23.049 CST [35272] LOG: invalid primary checkpoint record 2022-05-29 21:52:23.049 CST [35272] PANIC: could not locate a valid checkpoint record 2022-05-29 21:52:24.211 CST [35270] LOG: startup process (PID 35272) was terminated by signal 6: Aborted 2022-05-29 21:52:24.211 CST [35270] LOG: aborting startup due to startup process failure 2022-05-29 21:52:24.218 CST [35270] LOG: database system is shut down
错误比较明显,无法定位到有效的checkpoint记录,在oracle里面的意思可以理解为无法进行实例恢复,pg启动失败
重设wal
由于数据库为不一致状态,需要使用-f进行强制重设
-bash-4.2$ pg_resetwal $PGDATA The database server was not shut down cleanly. Resetting the write-ahead log might cause data to be lost. If you want to proceed anyway, use -f to force reset. -bash-4.2$ pg_resetwal -f $PGDATA Write-ahead log reset
启动PG成功
-bash-4.2$ pg_ctl start -D $PGDATA waiting for server to start....2022-05-29 22:01:02.647 CST [37178] LOG: redirecting log output to logging collector process 2022-05-29 22:01:02.647 CST [37178] HINT: Future log output will appear in directory "log". done server started
日志记录
2022-05-29 22:01:02.647 CST [37178] LOG: starting PostgreSQL 14.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit 2022-05-29 22:01:02.648 CST [37178] LOG: listening on IPv6 address "::1", port 5432 2022-05-29 22:01:02.648 CST [37178] LOG: listening on IPv4 address "127.0.0.1", port 5432 2022-05-29 22:01:02.649 CST [37178] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2022-05-29 22:01:02.651 CST [37178] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2022-05-29 22:01:02.653 CST [37180] LOG: database system was shut down at 2022-05-29 22:00:47 CST 2022-05-29 22:01:02.661 CST [37178] LOG: database system is ready to accept connections
查看wal日志,产生新记录
-bash-4.2$ pwd /var/lib/pgsql/14/data/pg_wal -bash-4.2$ ls -ltr total 16384 drwx------. 2 postgres postgres 6 May 24 02:20 archive_status -rw-------. 1 postgres postgres 16777216 May 29 22:01 000000010000000000000014
验证刚刚创建测试表
-bash-4.2$ psql psql (14.3) Type "help" for help. postgres=# select count(1) from t_xifenfei; ERROR: relation "t_xifenfei" does not exist LINE 1: select count(1) from t_xifenfei; ^
由于需要进行实例恢复的wal日志丢失导致这表记录也丢失.由此可见这类操作可能导致数据丢失风险,对于生产环境,需要慎重,