Many Solaris admins are aware of the Fault Management Architecture in Solaris. However, it’s not really a habit I’ve seen frequently to have a regular peek into the output of fmadm list to look after the faults detected by Solaris. In Solaris a new PAM module has been integrated that gives you a message that looking into the information of the FMA may not be the dumbest idea.
It’s already in the default PAM configuration:
jmoekamp@testbed:~$ grep -i "session" /etc/pam.d/other
# Default definition for Session management
# Used when service name is not explicitly mentioned for session management
session definitive pam_user_policy.so.1
session required pam_unix_session.so.1
session optional pam_fm_notify.so.1
However, by default the module does nothing. You won’t get the message to look after the FMA:
joergmoellenkamp@Mac ~ % ssh jmoekamp@192.168.39.122
(jmoekamp@192.168.39.122) Password:
Last login: Thu Apr 3 05:09:23 2025 from 192.168.39.121
Oracle Solaris 11.4.79.189.2 Assembled March 2025
jmoekamp@testbed:~$
The reason for not being the default is simple: Perhaps not all users allowed to log into the system should see this kind of information. In order to get this kind of information at login, you need an additional authorisation. The necessary authorisation is called solaris.fm.read. You can add this authorisation to a user via usermod:
root@testbed:~# usermod -A +solaris.fm.read jmoekamp
Next time you log in as jmoekamp you will see a small but useful addition to the output:
joergmoellenkamp@Mac ~ % ssh jmoekamp@192.168.39.122
(jmoekamp@192.168.39.122) Password:
Last login: Thu Apr 3 05:11:43 2025 from 192.168.39.121
NOTE: system has 2 active diagnoses; run 'fmadm list' for details.
Oracle Solaris 11.4.79.189.2 Assembled March 2025
I think that it’s a very clever use of PAM.
But I feel like multiple things have failed for this to be the first notification.
Why hasn’t the *LOM sent an alert email / SNMP trap?
Why hasn’t periodic monitoring of fault management as part of overall monitoring alerted?
I mean for belt and suspenders, pam_fm_notify is suspenders. But where’s the belt?
All of that being said, I’m considering installing pam_fm_notify as an additional level of notification.
Do you know a way to disable the fault propagation from LDOMs to the CDOM and/or ILOM? I can only find articles where Oracle announced the feature had been added, but no info how to configure/disable it.
This is quite annoying, as e.g. a user coredumping in an LDOM would be "can be checked tomorrow" thing, while a fault reported in the CDOM is usually a "Call Woo right now!" issue.