2008-6-3 10:08
babyzhou
/var 突然100%!!!急。。。
今天早上登系统看,发现一台生产机/var 文件系统 100%(/var 系统分配了512M)
aix 5300-03-00,oracle 9.2 RAC,HA 5.1
[color=Red]1.[/color]df看了下,发现100%
[color=Red]2.[/color]errpt一把,感觉有core产生
[/var]>errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
A96B4002 0603010208 I O grpsvcs Group Services informational message
[color=DarkGreen]A96B4002[/color] 0603010208 I O grpsvcs Group Services informational message
[color=Blue]173C787F [/color] 0603010208 I S topsvcs Possible malfunction on local adapter
[color=Indigo]A63BEB70 [/color] 0603010108 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
F7FA22C9 0603010108 I O SYSJ2 UNABLE TO ALLOCATE SPACE IN FILE SYSTEM
[color=Indigo]A63BEB70[/color]
[/var]>errpt -aj A63BEB70
---------------------------------------------------------------------------
LABEL: CORE_DUMP
IDENTIFIER: A63BEB70
Date/Time: Tue Jun 3 01:01:27 BEIDT 2008
Sequence Number: 1029
Machine Id: 00CE7C5A4C00
Node Id: zjdxcwm8u01
Class: S
Type: PERM
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
11
USER'S PROCESS ID:
1241354
FILE SYSTEM SERIAL NUMBER
4
INODE NUMBER
393
PROCESSOR ID
4
CORE FILE NAME
/var/ha/run/topsvcs.cluster1/core
PROGRAM NAME
hats_rs232_nim
STACK EXECUTION DISABLED
0
ADDITIONAL INFORMATION
prepare_m 44
pack_inco 30
pack_inco 30
pack_inco 88
__20NIM_m 200
main_loop 3C0
main 1C8
__start 8C
Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/hats_rs23 SIG/11 FLDS/prepare_m VALU/44 FLDS/pack_inco
[color=Blue]173C787F[/color]
[/var]>errpt -aj 173C787F
---------------------------------------------------------------------------
LABEL: TS_LOC_DOWN_ST
IDENTIFIER: 173C787F
Date/Time: Tue Jun 3 01:02:27 BEIDT 2008
Sequence Number: 1030
Machine Id: 00CE7C5A4C00
Node Id: zjdxcwm8u01
Class: S
Type: INFO
Resource Name: topsvcs
Description
Possible malfunction on local adapter
Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Recommended Actions
Verify adapter configuration
Verify network connectivity
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.1,3701
ERROR ID
6zV5DL.Ha/F6/dQg1/E4e.1...................
REFERENCE CODE
Adapter interface name
tty0
Adapter offset
2
Adapter IP address
255.255.0.0
[color=DarkGreen]A96B4002[/color]
[/var]>errpt -aj A96B4002
---------------------------------------------------------------------------
LABEL: GS_MESSAGE_ST
IDENTIFIER: A96B4002
Date/Time: Tue Jun 3 01:02:28 BEIDT 2008
Sequence Number: 1032
Machine Id: 00CE7C5A4C00
Node Id: zjdxcwm8u01
Class: O
Type: INFO
Resource Name: grpsvcs
Description
Group Services informational message
Probable Causes
Informational message
Failure Causes ?
Informational message
[color=Red]3.[/color] 在/var下面find一把,把从昨天到现在24小时之间产生>100M的数据找出来,发现的确有个200多M的CORE产生,
core所在目录/var/ha/run/topsvcs.cluster1下,把该core备份了下,之后直接删除了
[color=Red]4.[/color] 以下是有关HA的log,感觉好像心跳有发生异常
[[i] 本帖最后由 babyzhou 于 2008-6-3 10:39 编辑 [/i]]
2008-6-3 10:29
babyzhou
[color=Red]1.[/[/color]tmp]>ls -alt|pg
total 9424
drwxrwxrwt 17 bin bin 12288 Jun 03 11:10 .
-rw-r--r-- 1 root system 250904 Jun 03 01:03 clstrmgr.debug
-rw-r--r-- 1 root system 0 Jun 03 01:02 clinfo.rc.out
-rw-r--r-- 1 root system 5675 Jun 03 01:02 hacmp.out
-rw-r--r-- 1 root system 2343 Jun 03 01:02 hacmprd_rcovcmd.err
[color=Red]2.cat /tmp/hacmp.out[/color]
Jun 3 01:02:30 EVENT START: network_down -1 net_rs232_01
:network_down[62] [[ high = high ]]
:network_down[62] version=1.23
:network_down[63] :network_down[63] cl_get_path
HA_DIR=es
:network_down[65] [ 2 -ne 2 ]
:network_down[77] :network_down[77] cl_rrmethods2call net_cleanup
:cl_rrmethods2call[49] [[ high = high ]]
:cl_rrmethods2call[49] version=1.8
:cl_rrmethods2call[50] :cl_rrmethods2call[50] cl_get_path
HA_DIR=es
:cl_rrmethods2call[63] :cl_rrmethods2call[63] odmget -qname=net_rs232_01 HACMPnetwork
:cl_rrmethods2call[63] egrep nimname
:cl_rrmethods2call[63] sed s/"//g
:cl_rrmethods2call[63] awk {print $3}
RRNET=rs232
:cl_rrmethods2call[63] [[ rs232 = Geo_Primary ]]
:cl_rrmethods2call[70] :cl_rrmethods2call[70] odmget -qtype=2 HACMPrresmethods
:cl_rrmethods2call[70] egrep net_cleanup =
:cl_rrmethods2call[70] sed s/"//g
:cl_rrmethods2call[70] awk {print $3}
RRMETHODS=
:cl_rrmethods2call[72] echo
:cl_rrmethods2call[73] exit 0
METHODS=
:network_down[91] set -u
:network_down[104] exit 0
Jun 3 01:02:30 EVENT COMPLETED: network_down -1 net_rs232_01 0
HACMP Event Summary
Event: network_down -1 net_rs232_01
Start time: Tue Jun 3 01:02:30 2008
End time: Tue Jun 3 01:02:30 2008
Action: Resource: Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------
Jun 3 01:02:30 EVENT START: network_down_complete -1 net_rs232_01
:network_down_complete[61] [[ high = high ]]
:network_down_complete[61] version=1.1.1.13
:network_down_complete[62] :network_down_complete[62] cl_get_path
HA_DIR=es
:network_down_complete[64] [ ! -n ]
:network_down_complete[66] EMULATE=REAL
:network_down_complete[69] [ 2 -ne 2 ]
:network_down_complete[75] set -u
:network_down_complete[81] STATUS=0
:network_down_complete[85] odmget HACMPnode
:network_down_complete[85] grep name =
:network_down_complete[85] sort
:network_down_complete[85] uniq
:network_down_complete[85] wc -l
:network_down_complete[85] [ 2 -eq 2 ]
:network_down_complete[87] :network_down_complete[87] odmget HACMPgroup
:network_down_complete[87] grep group =
:network_down_complete[87] sed s/"//g
:network_down_complete[87] awk {print $3}
RESOURCE_GROUPS=datavg
casg1
casg2
:network_down_complete[91] :network_down_complete[91] odmget -q group=datavg AND name=EXPORT_FILESYSTEM HACMPresource
:network_down_complete[91] grep value
:network_down_complete[91] sed s/"//g
:network_down_complete[91] awk {print $3}
EXPORTLIST=
:network_down_complete[92] [ -n ]
:network_down_complete[91] :network_down_complete[91] odmget -q group=casg1 AND name=EXPORT_FILESYSTEM HACMPresource
:network_down_complete[91] grep value
:network_down_complete[91] sed s/"//g
:network_down_complete[91] awk {print $3}
EXPORTLIST=
:network_down_complete[92] [ -n ]
:network_down_complete[91] :network_down_complete[91] odmget -q group=casg2 AND name=EXPORT_FILESYSTEM HACMPresource
:network_down_complete[91] grep value
:network_down_complete[91] sed s/"//g
:network_down_complete[91] awk {print $3}
EXPORTLIST=
:network_down_complete[92] [ -n ]
:network_down_complete[114] cl_hb_alias_network net_rs232_01 add
:cl_hb_alias_network[57] [[ high = high ]]
:cl_hb_alias_network[57] version=1.4
:cl_hb_alias_network[58] :cl_hb_alias_network[58] cl_get_path
HA_DIR=es
:cl_hb_alias_network[60] NETWORK=net_rs232_01
:cl_hb_alias_network[61] ACTION=add
:cl_hb_alias_network[64] [[ 2 != 2 ]]
:cl_hb_alias_network[70] [[ add != add ]]
:cl_hb_alias_network[76] set -u
:cl_hb_alias_network[78] cl_echo 33 Starting execution of /usr/es/sbin/cluster/utilities/cl_hb_alias_network with parameters net_rs232_01 add\n /usr/es/sbin/cluster/utilities/cl_hb_alias_network net_rs232_01 add
:cl_echo[49] version=1.13
:cl_echo[98] HACMP_OUT_FILE=/tmp/hacmp.out
Jun 3 2008 01:02:30 Starting execution of /usr/es/sbin/cluster/utilities/cl_hb_alias_network with parameters net_rs232_01 add
:cl_hb_alias_network[79] date
Tue Jun 3 01:02:30 BEIDT 2008
:cl_hb_alias_network[81] :cl_hb_alias_network[81] get_local_nodename
:get_local_nodename[40] [[ high = high ]]
:get_local_nodename[40] version=1.2.3.2
:get_local_nodename[41] :get_local_nodename[41] cl_get_path
HA_DIR=es
:get_local_nodename[43] AIXODMDIR=/etc/objrepos
:get_local_nodename[44] HAODMDIR=/etc/es/objrepos
:get_local_nodename[48] export ODMDIR=/etc/es/objrepos
:get_local_nodename[50] :get_local_nodename[50] grep nodename
:get_local_nodename[50] odmget HACMPcluster
nodename= nodename = "erp01"
:get_local_nodename[51] :get_local_nodename[51] sed -e s/.*= *//g -e s/\"//g
:get_local_nodename[51] print nodename = "erp01"
nodename=zjdxcwm8u01
:get_local_nodename[53] :get_local_nodename[53] cut -d: -f1
:get_local_nodename[53] cllsnode -cS
NODENAME=erp01
zjdxcwm8u02
:get_local_nodename[57] [[ zjdxcwm8u01 = erp01 ]]
:get_local_nodename[60] print erp01
:get_local_nodename[61] exit 0
LOCALNODENAME=erp01
:cl_hb_alias_network[82] STATUS=0
:cl_hb_alias_network[85] cllsnw -Scn net_rs232_01
:cl_hb_alias_network[85] grep -q hb_over_alias
:cl_hb_alias_network[85] cut -d: -f4
:cl_hb_alias_network[85] exit 0
:network_down_complete[120] exit 0
Jun 3 01:02:30 EVENT COMPLETED: network_down_complete -1 net_rs232_01 0
HACMP Event Summary
Event: network_down_complete -1 net_rs232_01
Start time: Tue Jun 3 01:02:30 2008
End time: Tue Jun 3 01:02:30 2008
Action: Resource: Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------
[color=Red]3.cat /tmp/hacmprd_rcovcmd.err[/color]
run_rcovcmd: Called on Tue Jun 3 01:02:30 2008
argv[0]::run_rcovcmd
argv[1]::-sport
argv[2]::1000
argv[3]::-result_node
argv[4]::1
argv[5]::-script_id
argv[6]::-1_TE_FAIL_NETWORK_-1_0
argv[7]::-command_id
argv[8]::1_network_down_complete -1 net_rs232_01 _2
argv[9]::-command
argv[10]::network_down_complete -1 net_rs232_01
argv[11]::-environment
argv[12]::LANG=en_USMEMBERSHIP=1 2COORDINATOR=1TZ=BEIST-8BEIDTTIMESTAMP=Tue Jun 3 01:02:30 BEIDT 2008EVENT_NODE=-1RESGRP_datavg_erp01=ONLINERES鐶RP_datavg_erp02=ONLINERESGRP_casg1_erp01=ONLINERESGRP_casg2_erp02=ONLINENODEerp01=UPi192x168x100x1_erp01=UPi192x168x101 x1_erp01=UPi10x10x10x26_erp01=UPpxdevxtty0_IFSOWNER=NODEerp02=UPi192x168x100x2_erp02=UPi192x168x101x2_erp02=UPi10x10x10x25_erp02=UPNUM_ACTIVE_NODES=2PRE_EVENT_MEMBERSHIP=erp01 erp02POST_EVENT_MEMBERSHIP=erp01 erp02CM_CLUSTER_ID=1134728307CM_CLUSTER_NAME=cluster1LOCALNODENAME=erp01EVENTSITENAME=LOCALNODEID=1PING_IP_ADDRESS= LC_FASTMSG=trueODMDIR=/usr/es/sbin/cluster/etc/objrepos/activePATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbinHACMP_VERSION=__PE__CLUSTER_MAJOR=51CLUSTER_MINOR=0VERBOSE_LOGGING=high
run_rcovcmd: User specified environment:
VERBOSE_LOGGING=high
CLUSTER_MINOR=0
CLUSTER_MAJOR=51
HACMP_VERSION=__PE__
PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin
ODMDIR=/usr/es/sbin/cluster/etc/objrepos/active
LC_FASTMSG=true
PING_IP_ADDRESS=
LOCALNODEID=1
EVENTSITENAME=
LOCALNODENAME=erp01
CM_CLUSTER_NAME=cluster1
CM_CLUSTER_ID=1134728307
POST_EVENT_MEMBERSHIP=erp01 erp02
PRE_EVENT_MEMBERSHIP=erp01 erp02
NUM_ACTIVE_NODES=2
i10x10x10x25_erp02=UP
i192x168x101x2_erp02=UP
i192x168x100x2_erp02=UP
NODEerp02=UP
pxdevxtty0_IFSOWNER=
i10x10x10x26_erp01=UP
i192x168x101x1_erp01=UP
i192x168x100x1_erp01=UP
NODEerp01=UP
RESGRP_casg2_erp02=ONLINE
RESGRP_casg1_erp01=ONLINE
RESGRP_datavg_erp02=ONLINE
RESGRP_datavg_erp01=ONLINE
EVENT_NODE=-1
TIMESTAMP=Tue Jun 3 01:02:30 BEIDT 2008
TZ=BEIST-8BEIDT
COORDINATOR=1
MEMBERSHIP=1 2
LANG=en_US
run_rcovcmd: Execution complete on Tue Jun 3 01:02:30 2008
Normal termination, exit status = 0
[color=Red]4./usr/es/adm]>cat cluster.log[/color]
..........
run_rcovcmd: Called on Tue Jun 3 01:02:30 2008
argv[0]::run_rcovcmd
argv[1]::-sport
argv[2]::1000
argv[3]::-result_node
argv[4]::1
argv[5]::-script_id
argv[6]::-1_TE_FAIL_NETWORK_-1_0
argv[7]::-command_id
argv[8]::1_network_down_complete -1 net_rs232_01 _2
argv[9]::-command
argv[10]::network_down_complete -1 net_rs232_01
argv[11]::-environment
argv[12]::LANG=en_USMEMBERSHIP=1 2COORDINATOR=1TZ=BEIST-8BEIDTTIMESTAMP=Tue Jun 3 01:02:30 BEIDT 2008EVENT_NODE=-1RESGRP_datavg_erp01=ONLINERES鐶RP_datavg_erp02=ONLINERESGRP_casg1_erp01=ONLINERESGRP_casg2_erp02=ONLINENODEerp01=UPi192x168x100x1_erp01=UPi192x168x101 x1_erp01=UPi10x10x10x26_erp01=UPpxdevxtty0_IFSOWNER=NODEerp02=UPi192x168x100x2_erp02=UPi192x168x101x2_erp02=UPi10x10x10x25_erp02=UPNUM_ACTIVE_NODES=2PRE_EVENT_MEMBERSHIP=erp01 erp02POST_EVENT_MEMBERSHIP=erp01 erp02CM_CLUSTER_ID=1134728307CM_CLUSTER_NAME=cluster1LOCALNODENAME=erp01EVENTSITENAME=LOCALNODEID=1PING_IP_ADDRESS= LC_FASTMSG=trueODMDIR=/usr/es/sbin/cluster/etc/objrepos/activePATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbinHACMP_VERSION=__PE__CLUSTER_MAJOR=51CLUSTER_MINOR=0VERBOSE_LOGGING=high
run_rcovcmd: User specified environment:
VERBOSE_LOGGING=high
CLUSTER_MINOR=0
CLUSTER_MAJOR=51
HACMP_VERSION=__PE__
PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin
ODMDIR=/usr/es/sbin/cluster/etc/objrepos/active
LC_FASTMSG=true
PING_IP_ADDRESS=
LOCALNODEID=1
EVENTSITENAME=
LOCALNODENAME=erp01
CM_CLUSTER_NAME=cluster1
CM_CLUSTER_ID=1134728307
POST_EVENT_MEMBERSHIP=erp01 erp02
PRE_EVENT_MEMBERSHIP=erp01 erp02
NUM_ACTIVE_NODES=2
i10x10x10x25_erp02=UP
i192x168x101x2_erp02=UP
i192x168x100x2_erp02=UP
NODEerp02=UP
pxdevxtty0_IFSOWNER=
i10x10x10x26_erp01=UP
i192x168x101x1_erp01=UP
i192x168x100x1_erp01=UP
NODEerp01=UP
RESGRP_casg2_erp02=ONLINE
RESGRP_casg1_erp01=ONLINE
RESGRP_datavg_erp02=ONLINE
RESGRP_datavg_erp01=ONLINE
EVENT_NODE=-1
TIMESTAMP=Tue Jun 3 01:02:30 BEIDT 2008
TZ=BEIST-8BEIDT
COORDINATOR=1
MEMBERSHIP=1 2
LANG=en_US
run_rcovcmd: Execution complete on Tue Jun 3 01:02:30 2008
Normal termination, exit status = 0
May 10 23:26:11 erp01 local0:crit clstrmgrES[766090]: Sat May 10 23:26:11 Setting Environment LANG=en_US
May 10 23:26:11 erp01 user:notice HACMP for AIX: EVENT START: network_down erp02 net_rs232_01
May 10 23:26:11 erp01 user:notice HACMP for AIX: EVENT COMPLETED: network_down erp02 net_rs232_01 0
May 10 23:26:13 erp01 user:notice HACMP for AIX: EVENT START: network_down_complete erp02 net_rs232_01
May 10 23:26:13 erp01 user:notice HACMP for AIX: EVENT COMPLETED: network_down_complete erp02 net_rs232_01 0
May 10 23:26:26 erp01 local0:info clinfoES[782418]: send_snmp_req: Messages in queue got = 5 read = 1
May 10 23:26:41 erp01 local0:info last message repeated 3 times
Jun 3 01:02:27 erp01 daemon:notice topsvcs[1015814]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6zV5DL.Ha/F6/dQg1/E4e.1...................:::Reference ID: :::Template ID: 173c787f:::Details File: :::Location: rsct,nim_control.C,1.39.1.1,3701 :::TS_LOC_DOWN_ST Possible malfunction on local adapter Adapter interface name tty0 Adapter offset 2 Adapter IP address 255.255.0.0
Jun 3 01:02:28 erp01 daemon:err|error grpsvcs[991360]: (Recorded using libct_ffdc.a cv 2):::Error ID: 60.oOd0Ia/F6/g1/./E4e.1...................:::Reference ID: :::Template ID: a96b4002:::Details File: :::Location: RSCT,TraceStream.C,1.84,695 :::GS_MESSAGE_ST Group Services
informational message DIAGNOSTIC EXPLANATION ERROR writing to log file /var/ha/log/grpsvcs_1_41.cluster1.long (rdstate=4 errno=28[There is not enoug[h space in the file system.] lost=1). Check filesystem.
Jun 3 01:02:28 erp01 daemon:err|error grpsvcs[991360]: (Recorded using libct_ffdc.a cv 2):::Error ID: 60.oOd0Ia/F6/INf./E4e.1...................:::Reference ID: :::Template ID: a96b4002:::Details File: :::Location: RSCT,TraceStream.C,1.84,695 :::GS_MESSAGE_ST Group Services informational message DIAGNOSTIC EXPLANATION ERROR writing to log file /var/ha/log/grpsvcs_1_41.cluster1 (rdstate=4 errno=28[There is not enough space in the file system.] lost=1). Check filesystem.
[[i] 本帖最后由 babyzhou 于 2008-6-4 09:28 编辑 [/i]]
2008-6-3 10:37
babyzhou
应用,数据库没啥异常
2008-6-3 10:49
指尖流沙
日志这么多,咋看啊?我是看不过来,找现场支持!:loveliness: :loveliness:
2008-6-3 10:55
babyzhou
[quote]原帖由 [i]指尖流沙[/i] 于 2008-6-3 10:49 发表 [url=http://bbs.loveunix.net/redirect.php?goto=findpost&pid=794645&ptid=85602][img]http://bbs.loveunix.net/images/common/back.gif[/img][/url]
日志这么多,咋看啊?我是看不过来,找现场支持!:loveliness: :loveliness: [/quote]
挑几个你觉得有用的看看,:handshake ,从日志上看好像rs232心跳有问题
请大伙有空一起来讨论讨论
2008-6-3 10:56
blackcat
那你测试一下你所有的通信吧
2008-6-3 11:04
指尖流沙
弱弱的问一句,测串口要不要停HA?
2008-6-3 11:07
meilixueshan
不需要阿
2008-6-3 11:14
babyzhou
期盼老农,arryh, orian等前辈能有空余时间上来溜达溜达
给小生的意见:lu4:
2008-6-3 11:22
kettyalx
F7FA22C9 0603010108 I O SYSJ2 UNABLE TO ALLOCATE SPACE IN FILE SYSTEM:D 有空看下这一条
[[i] 本帖最后由 kettyalx 于 2008-6-3 11:24 编辑 [/i]]
2008-6-3 11:23
指尖流沙
[quote]原帖由 [i]meilixueshan[/i] 于 2008-6-3 11:07 发表 [url=http://bbs.loveunix.net/redirect.php?goto=findpost&pid=794656&ptid=85602][img]http://bbs.loveunix.net/images/common/back.gif[/img][/url]
不需要阿 [/quote]
真的?下次俺就可以在生产上测了!!:loveliness: :loveliness: :loveliness: :loveliness:
800说CPU忙的没有闲工夫处理串口信号,超时后会出现串口网络不通的情况,反正俺不清楚.
2008-6-3 11:30
babyzhou
[quote]原帖由 [i]kettyalx[/i] 于 2008-6-3 11:22 发表 [url=http://bbs.loveunix.net/redirect.php?goto=findpost&pid=794662&ptid=85602][img]http://bbs.loveunix.net/images/common/back.gif[/img][/url]
F7FA22C9 0603010108 I O SYSJ2 UNABLE TO ALLOCATE SPACE IN FILE SYSTEM:D 有空看下这一条 [/quote]
看了的
就说/var文件系统没空间,问题在于那个core,有可能和HA有关系,但是我说不出个所以然来:lu3:
2008-6-3 11:31
babyzhou
[quote]原帖由 [i]指尖流沙[/i] 于 2008-6-3 11:23 发表 [url=http://bbs.loveunix.net/redirect.php?goto=findpost&pid=794664&ptid=85602][img]http://bbs.loveunix.net/images/common/back.gif[/img][/url]
真的?下次俺就可以在生产上测了!!:loveliness: :loveliness: :loveliness: :loveliness:
800说CPU忙的没有闲工夫处理串口信号,超时后会出现串口网络不通的情况,反正俺不清楚. [/quote]
你打800确认过,说不停HA也可以册串口心跳?
2008-6-3 12:42
void
hacmp core dump!
core文件可以直接先删除,如果需要分析,可以备份一下core文件后删除.
[[i] 本帖最后由 void 于 2008-6-3 12:45 编辑 [/i]]
2008-6-3 12:57
babyzhou
[quote]原帖由 [i]void[/i] 于 2008-6-3 12:42 发表 [url=http://bbs.loveunix.net/redirect.php?goto=findpost&pid=794689&ptid=85602][img]http://bbs.loveunix.net/images/common/back.gif[/img][/url]
hacmp core dump!
core文件可以直接先删除,如果需要分析,可以备份一下core文件后删除. [/quote]
从那些日志可以看出大概什么原因吗?
2008-6-3 14:14
指尖流沙
[quote]原帖由 [i]babyzhou[/i] 于 2008-6-3 11:31 发表 [url=http://bbs.loveunix.net/redirect.php?goto=findpost&pid=794671&ptid=85602][img]http://bbs.loveunix.net/images/common/back.gif[/img][/url]
你打800确认过,说不停HA也可以册串口心跳? [/quote]
老早以彰俺问的是偶尔出现HA串口不通的情况,ERRPT里报出与楼主类似错误的原因!
没问过测试串口需不需要停HA,如果要问800,不管给俺什么样的答案,至少我实际操作不停HA的时候是测不通的~~~~:lu4: :lu4: :lu4:
2008-6-3 14:26
larryh
测串口通不通,可以不用停HA,但肯定不是用STTY方式,那种方式不需要停HA就怪了!
但串口时通时不通,不通的时候少,什么测试都没用
删除CORE、打补丁唯一正道。还分析,分析啥?谁分析出机器代码哪里有问题,代替IBM实验室出补丁不成?难道这里有IBM实验室的高人?没有就甭乱指挥。
2008-6-3 14:42
指尖流沙
俺看着各大版主和各大超级版主就像IBM实验室里的高人~~:P :P :P
真想参观一下IBM实验室~!这辈子有指望吗?仰天长望,面朝南!:funk: :funk:
2008-6-3 15:14
benq011
:lol :lol
2008-6-3 15:30
diyxyj
有些问题自行不好处理,就去开个PMR吧,把snap -gc得到的snap.pax.Z包发过去。
2008-6-3 16:45
babyzhou
[quote]原帖由 [i]larryh[/i] 于 2008-6-3 14:26 发表 [url=http://bbs.loveunix.net/redirect.php?goto=findpost&pid=794725&ptid=85602][img]http://bbs.loveunix.net/images/common/back.gif[/img][/url]
测串口通不通,可以不用停HA,但肯定不是用STTY方式,那种方式不需要停HA就怪了!
但串口时通时不通,不通的时候少,什么测试都没用
删除CORE、打补丁唯一正道。还分析,分析啥?谁分析出机器代码哪里有问 ... [/quote]
谢谢larryh,这个问题是由于串口偶然一时不通引起的?我早上来发现这个问题,让DBA看了oracle,说没异常,但是心里还是不踏实,看hacmp.out,在今天早上1:02有个event start的动作,也不知道有没有关系,感觉上HA有切换的,但是早上让DBA看了oracle没异常,注明:oracle 9.2 rac,跑的是单节点,目前跑在erp02上,今天有问题的主机是erpa01.
真提心吊胆的,真怕生产机搞不好那天。。。。。:'(
Jun 3 01:02:30 EVENT START: network_down -1 net_rs232_01
:network_down[62] [[ high = high ]]
:network_down[62] version=1.23
:network_down[63] :network_down[63] cl_get_path
HA_DIR=es
:network_down[65] [ 2 -ne 2 ]
:network_down[77] :network_down[77] cl_rrmethods2call net_cleanup
:cl_rrmethods2call[49] [[ high = high ]]
:cl_rrmethods2call[49] version=1.8
:cl_rrmethods2call[50] :cl_rrmethods2call[50] cl_get_path
HA_DIR=es
:cl_rrmethods2call[63] :cl_rrmethods2call[63] odmget -qname=net_rs232_01 HACMPnetwork
:cl_rrmethods2call[63] egrep nimname
:cl_rrmethods2call[63] sed s/"//g
:cl_rrmethods2call[63] awk {print $3}
RRNET=rs232
:cl_rrmethods2call[63] [[ rs232 = Geo_Primary ]]
:cl_rrmethods2call[70] :cl_rrmethods2call[70] odmget -qtype=2 HACMPrresmethods
:cl_rrmethods2call[70] egrep net_cleanup =
:cl_rrmethods2call[70] sed s/"//g
:cl_rrmethods2call[70] awk {print $3}
RRMETHODS=
:cl_rrmethods2call[72] echo
:cl_rrmethods2call[73] exit 0
METHODS=
:network_down[91] set -u
:network_down[104] exit 0
Jun 3 01:02:30 EVENT COMPLETED: network_down -1 net_rs232_01 0
HACMP Event Summary
Event: network_down -1 net_rs232_01
Start time: Tue Jun 3 01:02:30 2008
End time: Tue Jun 3 01:02:30 2008
Action: Resource: Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------
Jun 3 01:02:30 EVENT START: network_down_complete -1 net_rs232_01
:network_down_complete[61] [[ high = high ]]
:network_down_complete[61] version=1.1.1.13
:network_down_complete[62] :network_down_complete[62] cl_get_path
HA_DIR=es
:network_down_complete[64] [ ! -n ]
:network_down_complete[66] EMULATE=REAL
:network_down_complete[69] [ 2 -ne 2 ]
:network_down_complete[75] set -u
:network_down_complete[81] STATUS=0
:network_down_complete[85] odmget HACMPnode
:network_down_complete[85] grep name =
:network_down_complete[85] sort
:network_down_complete[85] uniq
:network_down_complete[85] wc -l
:network_down_complete[85] [ 2 -eq 2 ]
:network_down_complete[87] :network_down_complete[87] odmget HACMPgroup
:network_down_complete[87] grep group =
:network_down_complete[87] sed s/"//g
:network_down_complete[87] awk {print $3}
RESOURCE_GROUPS=datavg
casg1
casg2
:network_down_complete[91] :network_down_complete[91] odmget -q group=datavg AND name=EXPORT_FILESYSTEM HACMPresource
:network_down_complete[91] grep value
:network_down_complete[91] sed s/"//g
:network_down_complete[91] awk {print $3}
EXPORTLIST=
:network_down_complete[92] [ -n ]
:network_down_complete[91] :network_down_complete[91] odmget -q group=casg1 AND name=EXPORT_FILESYSTEM HACMPresource
:network_down_complete[91] grep value
:network_down_complete[91] sed s/"//g
:network_down_complete[91] awk {print $3}
EXPORTLIST=
:network_down_complete[92] [ -n ]
:network_down_complete[91] :network_down_complete[91] odmget -q group=casg2 AND name=EXPORT_FILESYSTEM HACMPresource
:network_down_complete[91] grep value
:network_down_complete[91] sed s/"//g
:network_down_complete[91] awk {print $3}
EXPORTLIST=
:network_down_complete[92] [ -n ]
:network_down_complete[114] cl_hb_alias_network net_rs232_01 add
:cl_hb_alias_network[57] [[ high = high ]]
:cl_hb_alias_network[57] version=1.4
:cl_hb_alias_network[58] :cl_hb_alias_network[58] cl_get_path
HA_DIR=es
:cl_hb_alias_network[60] NETWORK=net_rs232_01
:cl_hb_alias_network[61] ACTION=add
:cl_hb_alias_network[64] [[ 2 != 2 ]]
:cl_hb_alias_network[70] [[ add != add ]]
:cl_hb_alias_network[76] set -u
:cl_hb_alias_network[78] cl_echo 33 Starting execution of /usr/es/sbin/cluster/utilities/cl_hb_alias_network with parameters net_rs232_01 add\n /usr/es/sbin/cluster/utilities/cl_hb_alias_network net_rs232_01 add
:cl_echo[49] version=1.13
:cl_echo[98] HACMP_OUT_FILE=/tmp/hacmp.out
Jun 3 2008 01:02:30 Starting execution of /usr/es/sbin/cluster/utilities/cl_hb_alias_network with parameters net_rs232_01 add
:cl_hb_alias_network[79] date
Tue Jun 3 01:02:30 BEIDT 2008
:cl_hb_alias_network[81] :cl_hb_alias_network[81] get_local_nodename
:get_local_nodename[40] [[ high = high ]]
:get_local_nodename[40] version=1.2.3.2
:get_local_nodename[41] :get_local_nodename[41] cl_get_path
HA_DIR=es
:get_local_nodename[43] AIXODMDIR=/etc/objrepos
:get_local_nodename[44] HAODMDIR=/etc/es/objrepos
:get_local_nodename[48] export ODMDIR=/etc/es/objrepos
:get_local_nodename[50] :get_local_nodename[50] grep nodename
:get_local_nodename[50] odmget HACMPcluster
nodename= nodename = "erp01"
:get_local_nodename[51] :get_local_nodename[51] sed -e s/.*= *//g -e s/\"//g
:get_local_nodename[51] print nodename = "erp01"
nodename=erp01
:get_local_nodename[53] :get_local_nodename[53] cut -d: -f1
:get_local_nodename[53] cllsnode -cS
NODENAME=erp01
erp02
:get_local_nodename[57] [[ erp01= erp01]]
:get_local_nodename[60] print erp01
:get_local_nodename[61] exit 0
LOCALNODENAME=erp01
:cl_hb_alias_network[82] STATUS=0
:cl_hb_alias_network[85] cllsnw -Scn net_rs232_01
:cl_hb_alias_network[85] grep -q hb_over_alias
:cl_hb_alias_network[85] cut -d: -f4
:cl_hb_alias_network[85] exit 0
:network_down_complete[120] exit 0
Jun 3 01:02:30 EVENT COMPLETED: network_down_complete -1 net_rs232_01 0
HACMP Event Summary
Event: network_down_complete -1 net_rs232_01
Start time: Tue Jun 3 01:02:30 2008
End time: Tue Jun 3 01:02:30 2008
Action: Resource: Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------
[[i] 本帖最后由 babyzhou 于 2008-6-3 16:58 编辑 [/i]]
2008-6-3 16:55
larryh
rs232在HA运行的时候通不通,clstat -a
2008-6-3 17:04
babyzhou
[quote]原帖由 [i]larryh[/i] 于 2008-6-3 16:55 发表 [url=http://bbs.loveunix.net/redirect.php?goto=findpost&pid=794777&ptid=85602][img]http://bbs.loveunix.net/images/common/back.gif[/img][/url]
rs232在HA运行的时候通不通,clstat -a [/quote]
clstat - HACMP Cluster Status Monitor
-------------------------------------
Cluster: cluster1 (1134728307)
Tue Jun 3 18:02:28 BEIDT 2008
State: UP Nodes: 2
SubState: STABLE
Node: erp01 State: UP
Interface: erp01_boot1 (0) Address: 192.168.100.1
State: UP
Interface: erp01_boot2 (0) Address: 192.168.101.1
State: UP
Interface: erp01_service (0) Address: 101.10.10.26
State: UP
Resource Group: casg1 State: On line
Resource Group: datavg State: On line
Node: erp02 State: UP
Interface: erp02_boot1 (0) Address: 192.168.100.2
State: UP
Interface: erp02_boot2 (0) Address: 192.168.101.2
State: UP
Interface: erp02_service (0) Address: 101.10.10.25
State: UP
Resource Group: casg2 State: On line
Resource Group: datavg State: On line
lssrc -g cluster
Subsystem Group PID Status
clsmuxpdES cluster 663652 active
clstrmgrES cluster 766090 active
clinfoES cluster 782418 active
[[i] 本帖最后由 babyzhou 于 2008-6-3 17:10 编辑 [/i]]
2008-6-3 18:15
larryh
上面的显示怎么没有串口心跳的Interface呢?你配了吗?
页:
[1]
2
Powered by Discuz! Archiver 5.5.0
© 2001-2006 Comsenz Inc.