This is an old revision of the document!
This morning we receive an email from the ILOM of a Sun Oracle X3-2L server: The suspect component: /SYS/SP has defect.ilom.fs.full
A full filesystem on an ILOM? Well I never… So I log into the ILOM to have a quick poke around, both on the CLI and web interface, but there's no directly accessible filesystem and shell to do any changes to the ILO 'filesystem' (without deep dropping into the ILOM's boot sector - but that's a write-up for another day) so I figure it's just some log file that is filling up and hopefully cleaning that up will do the trick. So I cleared the log files and rebooted the ILO. After it came back up all seemed ok - the alert auto-resolved, and all is well. Except I notice in the ILO audit log there's an entry for a login every minute or so. This could be why the log eventually filled up after we patched this ILOM about 4 months ago. At this stage I figured it could just be a little bug in the logging level or something so I pulled a new firmware down and installed it, hoping for a quick fix, and a free patch in the process. Of course, part of the ILO firmware patch is also a BIOS patch, so the server needs a reboot. Luckily this particular server is Well Looked AfterTM so a quick boot (well, quicker than an HP DL380g9, that's for sure) later the system is back up and running. All good.
Except no - the Infiniband interfaces, which are running IP over Infiniband, are showing as up and have link, but can't send or receive any network traffic.
Oracle hwmgmtd fills up the logs