User Tools

Site Tools


identify_and_add_failed_exadata_disks_back_into_asm

Add Failed Exadata ASM Disks back into ASM

Identify the Failed Grid Disk
# dcli -g cell_group -l root “cellcli -e list griddisk attributes name,asmmodestatus” | grep -v ‘ONLINE’
 
tuxcel06: DATA_CD_08_tuxcel06            UNKNOWN    <<<<
SQL> SELECT * FROM V$ASM_DISK WHERE LABEL LIKE ‘%DATA_CD_08_TUXCEL06%’ ORDER BY GROUP_NUMBER,DISK_NUMBER;
 
——————————————————————————————————————————–
GROUP_NUMBER DISK_NUMBER MOUNT_STATUS     HEADER_STATUS MODE_ST  TOTAL_MB FREE_MB HOT_USED_MB COLD_USED_MB NAME LABEL     PATH
——————————————————————————————————————————–
0     1      CLOSED     MEMBER       ONLINE      0 0 0 0     DATA_CD_08_PUXCEL06  o/172.16.0.10/DATA_CD_08_puxcel06
 
GROUP_NUMBER = 0
DISK_NUMBER = 1
MOUNT_STATUS = CLOSED        <<<<
HEADER_STATUS = MEMBER       <<<< 
MODE_STATUS=ONLINE
PATH=o/172.16.0.10/DATA_CD_08_puxcel06
LABEL=DATA_CD_08_PUXCEL06
ASM Alert log errors
Fri Sep 30 05:02:37 2011
WARNING: Disk DATA_CD_08_PUXCEL06 in mode 0x7f is now being offlined
Fri Sep 30 05:02:37 2011
kfdp_queryTimeout(DATA)
kfdp_queryTimeout(DATA)
NOTE: cache closing disk 7 of grp 1: DATA_CD_08_TUXCEL06

This shows problem with Grid Disk “DATA_CD_08_tuxcel06″. It seems the disk was dropped due to the DISK_REPAIR_TIMER (default 3.6 hours) expiring. Disk needs to be added back to the DATA Diskgroup.

Add disk back
SQL> ALTER diskgroup DATA ADD failgroup TUXCEL06 disk ‘o/172.16.0.10/DATA_CD_08_tuxcel06′ name DATA_CD_08_TUXCEL06 rebalance nowait;
ALTER diskgroup DATA ADD failgroup TUXCEL06 disk ‘o/172.16.0.10/DATA_CD_08_tuxcel06′ name DATA_CD_08_TUXCEL06 rebalance nowait;
*
ERROR at line 1:
ORA-15032: NOT ALL alterations performed
ORA-15033: disk ‘o/172.16.0.10/DATA_CD_08_puxcel06′ belongs TO diskgroup “DATA

A mount_status of CLOSED means ASM can see the disk but it is not using it. A header_status of CANDIDATE means the disk is not part of a disk group but can be added to one. MEMBER indicates diskgroup membership.

So the disk has a group_number=0 but is still a diskgroup member, as indicated by HEADER_STATUS. The group number will be assigned when the disk is mounted. If a disk is removed from a diskgroup, it becomes “FORMER” instead of “MEMBER”.

The status will be “FORMER” if the disk was explicitly deleted with the “drop disk” command.

The status will be “CANDIDATE” if the disk header was missing.

Since the failed disk was not accessible this field couldn’t be updated hence the “MEMBER” value remains.

All disk partner information would have been updated to indicate the disk is no longer in the diskgroup.

Re-add with ‘FORCE’ option
SQL> ALTER diskgroup DATA ADD failgroup PUXCEL06 disk ‘o/172.16.0.10/DATA_CD_08_puxcel06′ name DATA_CD_08_PUXCEL06 FORCE rebalance nowait;

######## Note ########

Exadata has an AUTO Management Feature which will try to add Failed ASM Disks back to the Diskgroup so in Most cases you will not even notice that a Disk was dropped as it would have been added back (as long as DISK_REPAIR_TIMER hasn’t lapsed)!

..You will see messages like below in the ASM Alert log »>

NOTE: Exadata Auto Management: OS PID: 32721 Initiating Operation ID: 29 Operation: DROP and ADD ASM disk: ADD RECO_CD_11_PUXCEL04 in Diskgroup RECO SQL : /* Exadata Auto Mgmt: ADD ASM Disk in given FAILGROUP */ alter diskgroup RECO add failgroup PUXCEL04 disk ‘o/172.16.0.8/RECO_CD_11_puxcel04′ name RECO_CD_11_PUXCEL04 force rebalance nowait

identify_and_add_failed_exadata_disks_back_into_asm.txt · Last modified: 2019/09/16 16:09 (external edit)