OpenSAF includes a healthcheck mechanism to monitor the health of processes, detect failures, and initiate recovery actions such as restarting processes or failing over to another node. This ensures high availability and enhances fault tolerance within the system.
Key classes:
SaAmfHealthcheckType
SaAmfHealthcheck
Key Attributes:
safHealthcheckKey
saAmfHctDefPeriod
, saAmfHealthcheckPeriod
:
The time interval at which the AMF sends a health check ping to the
program. If there is no response within this period, AMF considers the
program unresponsive and triggers a restart.saAmfHctDefMaxDuration
,
saAmfHealthcheckMaxDuration
In the previous post, we implemented an AMF SA-aware program. Today, we will integrate a healthcheck into the program.
$ immcfg -c SaAmfHealthcheck \
-a saAmfHealthcheckPeriod=5000000000 \
-a saAmfHealthcheckMaxDuration=3000000000 \
safHealthcheckKey=demo,safComp=demo,safSu=SC-1,safSg=demo,safApp=demo
# immcfg -c SaAmfHealthcheckType \
# -a saAmfHctDefPeriod=10000000000 \
# -a saAmfHctDefMaxDuration=5000000000 \
# safHealthcheckKey=demo,safVersion=1,safCompType=demo
# the time unit is nanosecond
void healthcheckCallback(SaInvocationT invocation,
const SaNameT *compName, SaAmfHealthcheckKeyT *key);
int healthcheckCount = 0;
int main(int argc, char ** argv)
{
// ...
.saAmfHealthcheckCallback = healthcheckCallback;
callbacks
(&amfHandler, &callbacks, &apiVersion);
saAmfInitialize_o4
// start healthcheck
(
saAmfHealthcheckStart, &name, &healthcheckKey,
amfHandler, SA_AMF_COMPONENT_RESTART);
SA_AMF_HEALTHCHECK_AMF_INVOKED// ...
}
void healthcheckCallback(SaInvocationT invocation,
const SaNameT *compName, SaAmfHealthcheckKeyT *key)
{
("count = %d", healthcheckCount);
trace+= 1;
healthcheckCount
(amfHandler, invocation, 0, SA_AIS_OK);
saAmfResponse_4}
$ amf-adm unlock-in safSu=SC-1,safSg=demo,safApp=demo
# syslog
2024-12-26T10:50:00.608935-05:00 SC-1 osafamfnd[15994]: NO 'safSu=SC-1,safSg=demo,safApp=demo' Presence State UNINSTANTIATED => INSTANTIATING
2024-12-26T10:50:00.613154-05:00 SC-1 demo[17859]: main: start
2024-12-26T10:50:00.614284-05:00 SC-1 osafamfnd[15994]: NO 'safSu=SC-1,safSg=demo,safApp=demo' Presence State INSTANTIATING => INSTANTIATED
2024-12-26T10:50:00.614606-05:00 SC-1 demo[17859]: main: receive poll event
2024-12-26T10:50:00.614704-05:00 SC-1 demo[17859]: healthcheckCallback: count = 0
2024-12-26T10:50:10.663296-05:00 SC-1 demo[17859]: main: receive poll event
2024-12-26T10:50:10.663724-05:00 SC-1 demo[17859]: healthcheckCallback: count = 1
2024-12-26T10:50:20.761590-05:00 SC-1 demo[17859]: main: receive poll event
2024-12-26T10:50:20.761904-05:00 SC-1 demo[17859]: healthcheckCallback: count = 2
$ amf-adm unlock safSu=SC-1,safSg=demo,safApp=demo
# syslog
2024-12-26T10:51:55.920498-05:00 SC-1 osafamfnd[15994]: NO Assigning 'safSi=demo,safApp=demo' ACTIVE to 'safSu=SC-1,safSg=demo,safApp=demo'
2024-12-26T10:51:55.920990-05:00 SC-1 demo[17859]: main: receive poll event
2024-12-26T10:51:55.921097-05:00 SC-1 demo[17859]: csiSetCallback:
2024-12-26T10:51:55.921154-05:00 SC-1 osafamfnd[15994]: NO Assigned 'safSi=demo,safApp=demo' ACTIVE to 'safSu=SC-1,safSg=demo,safApp=demo'