This is an old revision of the document!
Add Failed Exadata ASM Disks back into ASM
# dcli -g cell_group -l root “cellcli -e list griddisk attributes name,asmmodestatus” | grep -v ‘ONLINE’ tuxcel06: DATA_CD_08_tuxcel06 UNKNOWN <<<<
SQL> SELECT * FROM V$ASM_DISK WHERE LABEL LIKE ‘%DATA_CD_08_TUXCEL06%’ ORDER BY GROUP_NUMBER,DISK_NUMBER; ——————————————————————————————————————————– GROUP_NUMBER DISK_NUMBER MOUNT_STATUS HEADER_STATUS MODE_ST TOTAL_MB FREE_MB HOT_USED_MB COLD_USED_MB NAME LABEL PATH ——————————————————————————————————————————– 0 1 CLOSED MEMBER ONLINE 0 0 0 0 DATA_CD_08_PUXCEL06 o/172.16.0.10/DATA_CD_08_puxcel06 GROUP_NUMBER = 0 DISK_NUMBER = 1 MOUNT_STATUS = CLOSED <<<< HEADER_STATUS = MEMBER <<<< MODE_STATUS=ONLINE PATH=o/172.16.0.10/DATA_CD_08_puxcel06 LABEL=DATA_CD_08_PUXCEL06
Fri Sep 30 05:02:37 2011 WARNING: Disk DATA_CD_08_PUXCEL06 in mode 0x7f is now being offlined Fri Sep 30 05:02:37 2011 kfdp_queryTimeout(DATA) kfdp_queryTimeout(DATA) NOTE: cache closing disk 7 of grp 1: DATA_CD_08_TUXCEL06
This shows problem with Grid Disk “DATA_CD_08_tuxcel06″. It seems the disk was dropped due to the DISK_REPAIR_TIMER (default 3.6 hours) expiring. Disk needs to be added back to the DATA Diskgroup.
SQL> ALTER diskgroup DATA ADD failgroup TUXCEL06 disk ‘o/172.16.0.10/DATA_CD_08_tuxcel06′ name DATA_CD_08_TUXCEL06 rebalance nowait; ALTER diskgroup DATA ADD failgroup TUXCEL06 disk ‘o/172.16.0.10/DATA_CD_08_tuxcel06′ name DATA_CD_08_TUXCEL06 rebalance nowait; * ERROR at line 1: ORA-15032: NOT ALL alterations performed ORA-15033: disk ‘o/172.16.0.10/DATA_CD_08_puxcel06′ belongs TO diskgroup “DATA”
A mount_status of CLOSED means ASM can see the disk but it is not using it. A header_status of CANDIDATE means the disk is not part of a disk group but can be added to one. MEMBER indicates diskgroup membership.
So the disk has a group_number=0 but is still a diskgroup member, as indicated by HEADER_STATUS. The group number will be assigned when the disk is mounted. If a disk is removed from a diskgroup, it becomes “FORMER” instead of “MEMBER”.
The status will be “FORMER” if the disk was explicitly deleted with the “drop disk” command.
The status will be “CANDIDATE” if the disk header was missing.
Since the failed disk was not accessible this field couldn’t be updated hence the “MEMBER” value remains.
All disk partner information would have been updated to indicate the disk is no longer in the diskgroup.
SQL> ALTER diskgroup DATA ADD failgroup PUXCEL06 disk ‘o/172.16.0.10/DATA_CD_08_puxcel06′ name DATA_CD_08_PUXCEL06 FORCE rebalance nowait;
######## Note ########
Exadata has an AUTO Management Feature which will try to add Failed ASM Disks back to the Diskgroup so in Most cases you will not even notice that a Disk was dropped as it would have been added back (as long as DISK_REPAIR_TIMER hasn’t lapsed)!
..You will see messages like below in the ASM Alert log »>
NOTE: Exadata Auto Management: OS PID: 32721 Initiating Operation ID: 29 Operation: DROP and ADD ASM disk: ADD RECO_CD_11_PUXCEL04 in Diskgroup RECO SQL : /* Exadata Auto Mgmt: ADD ASM Disk in given FAILGROUP */ alter diskgroup RECO add failgroup PUXCEL04 disk ‘o/172.16.0.8/RECO_CD_11_puxcel04′ name RECO_CD_11_PUXCEL04 force rebalance nowait