This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
infinibork [2018/08/29 13:57] ss_wiki_admin |
infinibork [2019/09/16 16:09] (current) |
||
|---|---|---|---|
| Line 22: | Line 22: | ||
| I run ibdiagnet one more time and see an error that catches my eye: | I run ibdiagnet one more time and see an error that catches my eye: | ||
| //Missing master SM in the discover fabric// | //Missing master SM in the discover fabric// | ||
| - | Now I don't know everything about Infiniband, but I do know from my Exadata patching days that each Infiniband switch has an SM (Subnet | + | Now I don't know everything about Infiniband, but I do know from my Exadata patching days that each Infiniband switch has an SM (Subnet |
| - | Long story short, logging into each switch and ensuring there was one with a higher SM priority, and then disabling it and re-enabling | + | Long story short, logging into each Infiniband |
| + | But back to the ILOM filesystem filling up - the ILOM firmware update didn't seem to have reduced the amount of logging - but I was able to identify the source by paying more attention to the processes on the machine. I've been aware for a long time that kipmi0 runs hot on the CPU on this machine, although with very low priority, but now the repeated ILOM logins suggested that an IPMI-interfacing tool was hanging around and perhaps the IPMI tool was logging in regularly to check for any alerts from the ILOM. On a hunch I shutdown a process I thought it could be, and I was right. //kipmi0// spawns from checks made by the Oracle Hardware Management Pack (hwmgmtd). In essence, in this case, the hwmgmtd process was the one filling up the logs on the ILOM. The ILOM should handle it better, and hopefully the newer firmware will, but time will tell if I end up asking the vendor for a fix (no newer version for this OS is currently available at this time). | ||
| - | Oracle hwmgmtd fills up the logs | + | Urgh, what a rabbit hole. I surely could have done that better, but I'm glad I inadvertently discovered |
| + | |||
| + | |||
| + | < | ||
| + | On the management controller, type: | ||
| + | |||
| + | # setsmpriority priority | ||
| + | |||
| + | where priority is 0 (lowest) to 13 (highest). For example: | ||
| + | |||
| + | # setsmpriority 3 | ||
| + | ------------------------------------------------- | ||
| + | OpenSM 3.2.6_20090717 | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | Command Line Arguments: | ||
| + | | ||
| + | | ||
| + | Log File: / | ||
| + | ------------------------------------------------- | ||
| + | # | ||
| + | |||
| + | Restart the Subnet Manager: | ||
| + | |||
| + | # disablesm | ||
| + | Stopping IB Subnet Manager.. | ||
| + | # enablesm | ||
| + | Starting IB Subnet Manager. | ||
| + | # | ||
| + | </ | ||