---
canonical: https://safekit.evidian.com/wp-content/uploads/downloads_safekit/version-82/safekituserguidehtml/documentation/safekituserguideen.htm
---

## 13.8          Module scripts - <user>, <var>

This section describes only the
configuration options available for <user> tag. Refer to section 14 for a full description of module scripts.

When this tag is not set, the module
scripts are not executed.

### 13.8.1      <user> example

<user>

  <var name="*name1*" value="*value1*"
/>

</user>

|  |  |
| --- | --- |
| Sous-titres contour | For an example of <var> usage, refer to section 15.3. See also the full example of a mirror module at section 15.1 or a farm module at section 15.2. It presents the configuration via the web console along with the corresponding userconfig.xml. |

### 13.8.2      <user> syntax

<user

   [nicestoptimeout="300s"]

   [forcestoptimeout="300s"]

   [logging="userlog"|"none"]

   [userlogsize="2048"]

 >

   <var name="*name1*" value="*value1*"
/>

   …

</user>

|  |  |
| --- | --- |
| Commentaire, ajouter contour | The <user> tag and full subtree can be changed with a dynamic configuration. |

### 13.8.3      <user>, <var> attributes

 

|  |  |
| --- | --- |
| <user |  |
| [nicestoptimeout=  "300s"] | Timeout delay in seconds to execute the stop\_xx script.  Default value: 300s (300 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| [forcestoptimeout=  "300s"] | Timeout delay in seconds to execute the stop\_xx -force script.  Default value: 300s (300 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| [logging="userlog"|"none"] | stdout and stderr messages of the application started in scripts.  ·         logging="userlog"  Messages are redirected into the logSAFEVAR/modules/*AM*/userlog\_<year>\_<month>\_<day>T<time>\_<script name>.ulog where *AM* is the module name (SAFEVAR=C:\safekit\var on Windows and SAFEVAR=/var/safekit on LINUX).  ·         logging="none", messages are not logged.  Default value: userlog |
| [userlogsize=  "2048"] | Limit in KB of the size of the userlog  On module start, the file is truncated to 0 if the size has reached this limit.  Default value: 2048 KB |
| [<var   name="*name1*"    value="*value1*"/>] | Optional environment variable and its value are exported before the execution of module scripts. Define as many var sections as there are environment variables to export. |

## 13.9          Virtual hostname - <vhost>, <virtualhostname>

The virtual hostname (vhost) allows
applications to see a virtual host name that is independent of the server’s
actual name. This is especially useful when applications need a consistent
hostname across all nodes, for example when the name is stored in a replicated
file.

### 13.9.1      <vhost> example

<vhost>

 
<virtualhostname name="*vhostname*" envfile="*vhostenv*"/>

</vhost>

|  |  |
| --- | --- |
| Sous-titres contour | See also the example in section 15.12. It presents the configuration via the web console along with the corresponding userconfig.xml. |

### 13.9.2      <vhost> syntax

<vhost>

 
<virtualhostname

    
name="virtual\_hostname"

    
envfile="*path\_of\_a\_file*"

   
[when="prim"|"second"|"both"]

  />

</vhost>

|  |  |
| --- | --- |
| Commentaire, ajouter contour | The <vhost> tag and subtree cannot be changed with a dynamic configuration. |

### 13.9.3      <vhost>, <virtualhostname> attributes

|  |  |
| --- | --- |
| <vhost> |  |
| <virtualhostname |  |
| name="*virtual\_hostname*" | Definition of the virtual hostname. |
| envfile="*path\_of\_envfile*" | Path of the environment file automatically generated by SafeKit during configuration command  If the path of the file is relative, the file will be generated in the runtime environment of the application module i.e.: SAFEUSERBIN  This generated environment file is used in module scripts to set the virtual hostname before starting and stopping the application. See the module template vhost.safe delivered with Linux and Windows package. |
| [when="prim"|"second"|  "both"] | Define when the virtual hostname must be returned to the application instead of the physical one.  Default value: prim means when the server is primary (PRIM or ALONE). |
| /> |  |
| </vhost> |  |

### 13.9.4      <vhost> description

Some applications need to see the same hostname
on all SafeKit servers (typically, because it is stored in a replicated file). With
the virtual hostname, these applications see the virtual name whereas other applications
see the physical name.

·        
On Linux

Implementation
is based on the LD\_PRELOAD environment variable: gethostname and uname
functions are overloaded.

·        
On Windows

Implementation
is based on the CLUSTER\_NETWORK\_NAME\_ environment variable: the query API
(GetComputerName, GetComputerNameEx, gethostname) functions take this variable
into account. To use vhost for a service, use the command vhostservice <service>
[<file>] before/after the service start/stop.

## 13.10      Process or service monitoring - <errd>, <proc>

This section describes the configuration
options available for the <errd> tag. errd monitors critical processes and services within a SafeKit module.
It automatically detects their failures and triggers corrective action
such as a module restart or stop/stopstart, the latter two trigger a failover.

Specify class="prim" to activate monitoring only on the primary node ![](safekituserguideen_fichiers/image382.jpg)ALONE | PRIM(Ready)for a mirror
module.  
Specify class="both" to activate monitoring on all nodes ![](safekituserguideen_fichiers/image383.jpg)UP(Ready) for a farm module.

The maxloop configuration
attribute limits the number of recovery attempts after an error is detected (see section 13.3). If the issue persists
beyond this limit, the module is stopped locally, and a failover is initiated.

|  |  |
| --- | --- |
| Commentaire important contour | <errd> section requires <user/> section. |

### 13.10.1  <errd> example

|  |  |
| --- | --- |
| Sous-titres contour | See also a full example in section 15.4. It presents the configuration via the web console along with the corresponding userconfig.xml. |

#### 13.10.1.1                      Process monitoring

·        
Linux and Windows

*myproc* is the
command name of the process to monitor:

<errd>

  <proc name="*myproc*"
action="restart" class="*prim*"/>

</errd>

·        
Linux only (since SafeKit > 7.2.0.29)

*oracle\_.\** is a regular expression on the command name of the process to monitor:

<errd>

  <proc name="oracle" nameregex="*oracle\_.\**" action="restart" class="*prim*"/>

</errd>

|  |  |
| --- | --- |
| Commentaire important contour | Specify class="prim" for a mirror module; class="both" for a farm module. |

#### 13.10.1.2                      Service monitoring

*myservice* is the
name of a service to monitor. In Windows, it is the
name of a Windows service (since safekit > 7.3). In Linux, it is the name of
a systemd service (since safekit > 7.4.0.19).

<errd> 

  <proc
name="*myservice*" service="yes" action="restart"
class="*prim*" />

</errd>

|  |  |
| --- | --- |
| Commentaire important contour | Specify class="prim" for a mirror module; class="both" for a farm module. |

#### 13.10.1.3                      Service monitoring with targeted service restart

Since SafeKit 8.2.6, the new restart\_services attribute allows targeted restarts of specific services instead of
restarting all services globally. This attribute contains the names of the
services to restart, listed in startup order and separated by commas.

<errd> 

  <proc
name="*Service1*" service="*yes*"
action="restart"   
   restart\_services="*Service1*" class="*prim*"
/>

  <proc
name="*Service2*" service="*yes*"
action="restart"   
   restart\_services="*Service2, Service3*" class="*prim*"
/>

</errd>

|  |  |
| --- | --- |
| Commentaire important contour | Specify class="prim" for a mirror module; class="both" for a farm module. |

In the example above, if *Service2* stops, errd triggers the module restart action. This results in the
execution of stop\_prim followed by start\_prim with the additional parameter -Services "Service2, Service3". If the scripts are designed to handle this parameter, they could
stop and start only the specified list of services instead of all services.

This feature is particularly useful when
services are independent, as it helps reduce restart time.

The mirror.safe (refer to
section 15.1) and farm.safe (refer
to section 15.2) modules,
delivered since SafeKit 8.2.4, include the start\_xx/stop\_xx
scripts that support the   
-Services parameter. For older modules that do not include these new scripts:

·        
either modify the existing module scripts to
handle the -Services parameter

·        
or migrate the existing configuration to the new
module template

|  |  |
| --- | --- |
| Commentaire important contour | Be aware that if the scripts do not support the -Services parameter, the restart will apply to all services despite the restart\_services setting. |

Furthermore,
if your old module contained a configuration with a specific handler as
illustrated below, this configuration can potentially be migrated to the new
implementation based on the restart\_services attribute and the module scripts that support the -Services
parameter.

<errd>   

  <proc
name="myservice" service="yes" atleast="1"
action="restart\_myservice" class="myservice"/>

</errd>

This
migration allows for better integration of targeted service restarts. The
configuration using a specific handler therefore becomes obsolete, although it
is still supported for backward compatibility

### 13.10.2  <errd> syntax

<errd

 
[polltimer="30s"]

>        

  <proc
name="command name and/or resource name for the monitored process (or
service in Windows)"

       
[service="no|yes"]

       
[nameregex="regular expression on the command name"]

       
[argregex="regular expression on process arguments, including command
name"]

       
atleast="1"

       
action="stopstart"|"restart"|"stop"|"*executable\_name*"

       
[restart\_services="targeted services to restart, listed in startup order
and separated by commas"]

       
class="prim"|"both|"pre"|"second"|"sec"|"*othername*"]

       
[start\_after="*nb polling cycles*"]

       
[atmax="-1"]

  />

  …

</errd>

|  |  |
| --- | --- |
| Commentaire important contour | The <errd> tag and full subtree can be changed with a dynamic configuration. |

### 13.10.3  <errd>, <proc> attributes

|  |  |
| --- | --- |
| <errd |  |
| polltimer="30s" | Time delay, in seconds, between two polls of the list of processes.  Default value: 30s (30 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| <proc | Definition of a process to monitor. Set as many proc sections as there are processes.  A resource is associated with each <proc>, it is named proc.<value of the attribute name> (e. g proc.process\_name). The resource is up when the monitoring condition is true; else down if false. |
| name="*command\_name*" | *command\_name* is the command name of the process to monitor. It is also the name of the resource associated with the monitored process.  At max 15 characters in Linux (the command name can be truncated); 63 in Windows.  For example:  ·         name="vi" on Linux  ·         name="notepad.exe" on Windows   |  |  | | --- | --- | | Commentaire important contour | In Windows only, the name is automatically converted to lower case. |   See section 13.10.4 for help on retrieving the process command name. |
| **Or**  name="*service\_name*"  service="yes" | *service\_name* is the name of the service to monitor. It is also the name of the resource associated with the monitored service.  At max 63 characters.  For example:  ·         name="W32Time" service="yes" for monitoring the Windows Time service  ·         name="ntpd" service="yes" for monitoring the Linux Time service (systemd ntpd.service)  The service attribute is optional.  Default value: no |
| **Or**  name="*command\_name*"  nameregex="*regular expression on the command name*" | **Linux only**  nameregex is a regular expression applied on the command name for selecting the process to monitor.  name is name of the resource associated with the monitored process.   |  |  | | --- | --- | | Commentaire important contour | As regular expressions are defined inside the XML file userconfig.xml, special characters interpreted by XML like '<' or '>' cannot be used in regular expressions. |   For example:  ·         nameregex="oracle \_. \*" name="oracle"  for monitoring oracle process that matches the regular expression. The associated resource is proc.oracle  The nameregex attribute is optional. |
| class=  "prim"|  "both"|  "pre"|  "second"|  "sec"|  "*othername*" | The process belongs to a class that defines when monitoring is active or inactive, depending on the module state.  Use class="prim" to activate monitoring only on the primary node for a mirror module.  Use class="both" to activate monitoring on both nodes for a farm module.  ·         class="prim"|"both"|"pre"|"second"|"sec"  Activation/deactivation of these classes are automatically done in the <user/> component after/before running start\_prim/stop\_prim, start\_both/stop\_both, start\_second/stop\_second, start\_sec/stop\_sec. For scripts details, see section 14.  ·         class="*othername*"  For nonstandard classes, you must explicitly enable/disable process monitoring after/before the start/stop of the process. For this, use the command safekit errd enable|disable "classname" -m *AM*. |
| [argregex="*regular expression on process arguments*"] | Regular expression matching the list of arguments of the process to monitor, including the executable name. Optional parameter.  The regex engine is POSIX Extended regex (see POSIX documentation):  ·         in Windows, case insensitive mode  ·         in Linux, case sensitive mode   |  |  | | --- | --- | | Commentaire important contour | As regular expressions are defined inside the XML file userconfig.xml, special characters interpreted by XML like '<' or '>' cannot be used in regular expressions. |   See section 13.10.4 for help on retrieving the list of arguments of a process.  ·         Linux examples with vi editor on myfile  <proc name="vi" argregex=".\*myfile.\*" …  <proc name="vi" argregex="/myrep/myfile.\*" …  <proc name="vi" argregex="/myrep/myfile" …  ·         Windows examples with notepad editor on myfile    <proc name="notepad.exe" argregex=".\*myfile.\*" …  <proc name="notepad.exe" argregex="c:\\myrep\\myfile.\*" …  <proc name="notepad.exe" argregex="c:\\myrep\\myfile" … |
| action=  "restart"|  "stopstart"|  "stop"|  "noaction"|  "*executable\_name*" | Action (or handler) to execute on the module  ·         action="restart" triggers a local restart  ·         action="stopstart" triggers a stopstart and may lead to a failover  ·         action="stop" triggers a stop and may lead to a failover  ·         To avoid a loop on reproducible fault, a maxloop counter is incremented at each restart/stopstart command. For the maxloop definition, see section 13.3.  ·         action="noaction" means logging a message  ·         action="*executable\_name*"  ·         To define a special handler, either set an absolute path or a path relative to the "bin" directory of the module: SAFE/modules/*AM*/bin/. We recommend a relative path, and a handler defined inside the module. When defining a special handler, a new class name must be associated with the monitored process.  For a special handler on Linux, on success, end with exit 0. For a special handler on Windows, on success, end with %SAFEBIN%\exitcode 0. With different values, SafeKit performs a stopstart command.  When running special handlers, the maxloop counter is not incremented. To increment it, use: safekit incloop -m *AM* -i <handler name>  This command increments the counter and exits with status 1 when the limit has been reached.  Default value: stopstart |
| [restart\_services=  "*targeted\_services*"] | Since SafeKit 8.2.6, the restart\_services attribute allows targeted restarts of specific services instead of restarting all services globally. This attribute contains the names of the services to restart, listed in startup order and separated by commas.  Supported only if:  ·         service="yes"  ·         action="restart"  ·         the module scripts support the -Services parameter  For a full description, see section 13.10.1.3. |
| [start\_after="*nb polling cycles*"] | Without the start\_after attribute the monitoring of processes is immediately effective.  Otherwise, it is delayed for (n-1)\*polltimer (in seconds) where:  ·         n is the value given in start\_after parameter  ·         polltimer is the value set on the errd flag (30 seconds by default)  For example, if start\_after="3", the server is delayed for 60 seconds ((3-1)\*30).  The start\_after parameter is useful if the process takes a certain time to start.  Default value: 0 |
| atleast="1" | Minimum number of processes that must be running.  If this minimum is not reached, then SafeKit triggers an action  ·         name="oracle" argregex=".\*db1.\*" atleast="1" means that an action will be triggered if less than one oracle instance is running on db1.  ·         atleast="-1" this criterion is meaningless  Default value: 1 |
| atmax="-1" | Maximum number of processes that can run.  If this maximum is reached, then SafeKit triggers an action.  ·         atmax="-1" means that this criterion is meaningless.  ·         atmax="0", an action is triggered each time the process is started.  Default value: -1 this criterion is meaningless |
| </errd> |  |

### 13.10.4  <errd> commands

|  |  |
| --- | --- |
| Commentaire, ajouter contour | If the command is used inside a module script, then the SAFEMODULE environment variable is set and the -m *AM* parameter is not necessary. |

 

|  |  |
| --- | --- |
| safekit -r errdpoll\_running | This command prints into the file **SAFEVAR/errdpoll\_reserrd** (SAFEVAR=/var/safekit on Linux and SAFEVAR=c:\safekit\var on Windows if c: is the installation drive), one line for each running process with following fields:  <pid> <command name> <command full name and arguments list> (parent=<parent  pid>)  In Windows, the command name is displayed in lower case.  Useful to find the process name and its arguments for an <errd> configuration |
| safekit errd disable "*classname*" -m *AM* | Suspends the monitoring of the processes included in the class classname (for the application module *AM*).  Must be explicitly done in stop\_... scripts before stopping the application, for processes in class different from prim, both, second, sec. |
| safekit errd enable "*classname*" -m *AM* | Resumes the monitoring of the processes defined with the class classname (for the application module *AM*).  Must be explicitly done in start\_... scripts after starting the application, for processes in class different from prim, both, second, sec. |
| safekit errd off   -m *AM* | Suspends the monitoring of all processes except SafeKit processes (for the application module *AM*).  Useful when stopping manually the application without triggering error detection.   |  |  | | --- | --- | | Commentaire important contour | With SafeKit < 8.2, use   safekit errd suspend -m *AM* | |
| safekit errd on   -m *AM* | Resumes the monitoring of processes suspended with safekit errd suspend (for the application module *AM*).   |  |  | | --- | --- | | Commentaire important contour | With SafeKit < 8.2, use   safekit errd resume -m *AM* | |
| safekit errd list   -m *AM* | Lists all processes monitored by SafeKit (including SafeKit processes) and defined in the application module *AM*.  The list displayed may be truncated due to internal limits. The full list can be found in the file **SAFEVAR/modules/*AM*/errdlist**.  SAFEVAR=/var/safekit on Linux and SAFEVAR=c:\safekit\var on Windows if c: is the installation drive. |
| safekit kill   -name="*process\_name*"   [-argregex="…"]  -level="*kill\_level*" | <errd> component must run.  ·         level="test": only display the process list  ·         level="terminate": kill processes  ·         level="9": send SIGKILL signal to processes (Linux only)  ·         level="15": send SIGTERM signal to processes (Linux only)     ·         Windows examples ("class CatlRegExp" for more information)  safekit kill -name="notepad.exe"   -argregex=".\*myfile.\*" -level="terminate"  safekit kill -name="notepad.exe"   -argregex="c:\\myrep\\myfile.\*"   -level="terminate"  ·         Linux examples ("man regex" for more information)  safekit kill -name="vi"   -argregex=".\*myfile.\*" -level="9"  safekit kill -name="vi"   -argregex="/myrep/myfile.\*"   -level="9" |

## 13.11      Checkers - <check>

SafeKit provides checkers that test a
critical element and affect the state of a module resource based on the test
result. Upon error detection by a checker, the failover machine performs an
action on the module according to the failover rule associated with the
checker. For a complete description, see section 13.11.3.

The checkers provided by SafeKit are:

![*](safekituserguideen_fichiers/image001.png)      
section 13.12 “TCP checker - <tcp>”

![*](safekituserguideen_fichiers/image001.png)      
section 13.13 “Ping checker - <ping>”

![*](safekituserguideen_fichiers/image001.png)      
section 13.14 “Interface checker - <intf>”

![*](safekituserguideen_fichiers/image001.png)      
section 13.15 “IP checker - <ip>”

![*](safekituserguideen_fichiers/image001.png)      
section 13.16 “Custom checker - <custom>”

![*](safekituserguideen_fichiers/image001.png)      
section 13.17 “Module checker - <module>”

![*](safekituserguideen_fichiers/image001.png)      
section 13.18 “Splitbrain checker - <splitbrain>”

### 13.11.1  <check> example

All built-in checkers are configured under
a single <check> section:

<check>

  <!--
Insert below <tcp> <ping> <intf> <ip> <custom>
<module> <splitbrain> tags -->

</check>

### 13.11.2  <check> syntax

<check>

  <tcp …>

    <to
…/>

  </tcp>

  …

  <ping
…>

    <to
…/>

  </ping>

  …

  <intf
…>

    <to
…/>

  </intf>

  …

  <ip …>

    <to
…/>

  </ip>

  …

  <custom
…/>

  …

  <module
…>

    [<to
…/>]

 
</module>

…

 
<splitbrain …/>

</check>

|  |  |
| --- | --- |
| Commentaire, ajouter contour | The <check> tag and full subtree can be changed with a dynamic configuration. |

### 13.11.3  <checker> description

A checker tests a critical element (by
default every 10 seconds) and affects the state of the associated resource,
setting it to up or down based on the test result. The failover machine evaluates the
failover rules and executes the action associated with the checker when the
resource changes state.

![](safekituserguideen_fichiers/image386.jpg)

·        
The initial state of the resource is init. The
failover machine keeps the module in the ![](safekituserguideen_fichiers/image387.jpg)WAIT (Transient) state as long as at least one resource used by a rule with a wait
action is in the init state.

·        
If the test fails, the associated resource is
set to down. The failover rule associated with the checker determines which
action to take in this case. Possible actions on the module are restart, stop, stopstart,
or wait.

o    The restart action triggers a local restart of the application without changing
the module's state.

o    The actions stop, stopstart, and wait involve stopping the module, and consequently the application,
followed by an automatic restart in the cases of stopstart and wait.
Stopping the module may trigger a failover to the other node if it is ![](safekituserguideen_fichiers/image388.jpg) (Ready).

o    When the action is wait, the module remains stuck in the ![](safekituserguideen_fichiers/image389.jpg)WAIT (NotReady) state as long as the resource is down.

The actions restart, stopstart,
and wait increment the error detection counter. When this counter exceeds
the maxloop limit within the time interval loop\_interval (by
default, on the 4th error detection within 24 hours; see section 13.3.3), the module
is stopped.

·        
If the test succeeds, the associated resource is
set to up. This triggers the implicit wakeup action if the
associated action is wait. The module exits the ![](safekituserguideen_fichiers/image390.jpg)WAIT (NotReady) state and continues its normal startup process.

 

The configuration of the checker
determines:

·        
The name of the associated resource

·        
Optionally, the name of the associated failover
rule and the action

 

#### 13.11.3.1                      Module resource associated with a checker

·        
The initial state of the resource is init

·        
If the test fails, the associated resource is
set to down

·        
If the test succeeds, the associated resource is
set to up

For a description of the resources, see section 13.19.4.1.

The name of the resource associated with
the checker is determined from its configuration:

·        
The resource class is the value of the XML tag
of the checker: tcp, ping, intf, ip, custom, module or splitbrain

·        
The resource id is the value of the ident
attribute.

For example, for the following
configuration of a ping checker:

<check>

  <**ping**
ident="**testR2**" action="wait">

    <to
addr="R2"/>

  </ping>

</check>

The associated resource is named **ping.testR2**.

 

The current value of the resource is
visible:

·        
via the web console as described in section 3.4.4.2

·        
with the command safekit state -v -m *AM* (where *AM* is the name of the module)

…

ping.testR2                             down             
yyyy-mm-dd

 

State changes of the resource are visible
in the module log:

·        
via the web console as described in section 3.4.4.1

·        
with the command safekit logview -A -m *AM* (where *AM* is the name of the module)

I | Resource ping.testR2 set to up by pingcheck

…

C | Resource ping.testR2 set to down by pingcheck

#### 13.11.3.2                      Failover rule associated with checker

The failover rule associated with the
checker defines which action to take when its resource goes down. For a
description of the failover rules, see section 13.19.4.2.

The possible actions for the module are restart, stop, stopstart
or wait.

The failover rule associated with the
checker is determined based on its configuration:

·        
The checkers intf, ip, module,
and splitbrain have a predefined default rule that applies to all resources of
that type:

/\* rule for module checkers \*/

module\_failure: if (module.? == down) then wait();

 

/\* rule for interface checkers \*/

interface\_failure: if (intf.? == down) then wait();

 

/\* rule for ip checkers \*/

ip\_failure: if (ip.? == down) then stopstart();

 

/\* rules for splitbrain \*/

splitbrain\_failure: if (splitbrain.uptodate == down) then
wait();

·        
The checkers tcp, ping, and custom
have a rule generated with the value of the action attribute if it is set to stop, stopstart,
restart or wait.

For
example, for the following configuration of a ping checker:

<check>

  <**ping** ident="**testR2**"
action="**wait**">

    <to addr="R2"/>

  </ping>

</check>

The
generated rule is named:

**p\_testR2** :
if (**ping.testR2** == down) then **wait**();

The name of the
rule has as a prefix the first letter of the checker name (t, p or c),
followed by \_, then the value of the attribute ident.

·        
The tcp, ping, and custom
checkers do not have a failover rule if the value of the action attribute in
their configuration is set to noaction. In this case, the user must explicitly add the associated failover
rule in the module configuration. For example, for the following configuration
of a custom checker, the failover rule is added explicitly:

<check>

  <**custom**
ident="**checkfile**" exec="checker.ps1"   
          arg="c:\safekit\checkfile" when="prim"
action="**noaction**"/>

</check>

 

<failover>

  <![CDATA[

    **checkfile\_failure:
if( custom.checkfile == down ) then restart();**

  ]]>

</failover>

 

When the failover rule is activated, it is
visible:

·        
Through the web console in the detailed status
of the module described in section 3.4.2.2

·        
By a message in the module log like the
following:

C | Action wait
according to the failover rule p\_testR2

The module log
can be viewed:

o    Through the web console as described in section 3.4.4.1

o    Using the command safekit
logview -A -m *AM* (where *AM* is
the name of the module)

## 13.12      TCP checker - <tcp>

By default, there is a restart
action on the module when the tcp checker detects a connection failure to the TCP service.

Since SafeKit 8.2.3, the action can be
configured using the action attribute of the <tcp> tag.

|  |  |
| --- | --- |
| Commentaire important contour | Insert the <tcp> tag into the <check> section if this one is already defined. |

### 13.12.1  <tcp> example

<check> 

  <tcp
ident="R1test" when="prim" action="restart" >

    <to
addr="R1" port="80"/>

  </tcp>

</check>

·        
The resource associated with the checker is
named tcp.R1test (with the prefix tcp.)

·        
The generated failover rule, which performs a restart
when the resource goes down, is named
t\_R1test (with the prefix t\_) and is equivalent
to:

t\_R1test:
if (tcp.R1test ==
down) then restart();

For a description of checkers, refer to section 13.11.3.

 

|  |  |
| --- | --- |
| Sous-titres contour | See also example in section 15.5. It presents the configuration via the web console along with the corresponding userconfig.xml. |

### 13.12.2  <tcp> syntax

  <tcp

   
ident="tcp\_checker\_name"

    when="prim|second|both|pre"

   
[action=" stop|stopstart|restart|wait|noaction"]

  >

    <to

    
addr="IP address or name to check"

    
port="TCP port to check"

    
[interval="10s"]

    
[timeout="5s"]

     />

  </tcp>

|  |  |
| --- | --- |
| Commentaire important contour | The <tcp> tag and full subtree can be changed with a dynamic configuration. |

 

|  |  |
| --- | --- |
| Commentaire important contour | Since SafeKit 8.2.3, use the action attribute to define the action to be taken when an error is detected by the tcp checker.  Before SafeKit 8.2.3, the action was static and defined by the default failover rule that applies to all tcp class resources:  tcp\_failure: if (tcp.? == down) then restart(); |

### 13.12.3  <tcp> attributes

 

|  |  |
| --- | --- |
| <tcp | Set as many <tcp> sections as there are TCP checkers. |
| ident="*tcp\_checker\_name*" | TCP checker name.            It defines the resource associated with the checker:  **tcp**.*tcp\_checker\_name* (with the prefix tcp.) |
| when="prim|second|both"  [action="stop|stopstart|restart|noaction"] | Use this value to test an internal TCP service of the application once it has started:  ·         when="prim" for a mirror module  The checker is started after/stopped before the execution of the start\_prim/stop\_prim scripts.  ·         when="both" for a farm module  The checker is started after/stopped before the execution of the start\_both/stop\_both scripts.  ·         when="second" for a mirror module  The checker is started after/stopped before the execution of the start\_second/stop\_second scripts.  Since SafeKit 8.2.3, you can configure the action to take when an error is detected with:  ·         action="stop|stopstart|restart"  stop, stopstart or restart the module. The name of the associated failover rule is **t\_***tcp\_checker\_name* (with the prefix t\_)  ·         action="noaction"  No action is generated automatically. The action must be explicitly written in the <failover> tag (see section 13.19).  Default value: action="restart" |
| when="pre"  action="wait|noaction" | Use this value to test an external TCP service before the application starts:  ·         when="pre"  The checker starts after/ends before the execution of the prestart/poststop scripts  Since SafeKit 8.2.3, you can configure the action to be taken in case of error detection with:  ·         action="wait"  wait on the module. The name of the associated failover rule is **t\_***tcp\_checker\_name* (with the prefix t\_)  ·         action="noaction"  No failover rule generated. The action must be explicitly written in the <failover> tag (see section 13.19). |
| <to |  |
| addr="*IP address or name*" | IP address or name to check (ex.: 127.0.0.1 for a local service).  IPv4 or IPv6 address. |
| port="*value*" | TCP port to check. |
| [interval="10s"] | Interval in seconds between two connections trials.  Default value: 10s (10 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| [timeout="5s"] | Connection establishment timeout in seconds.  Default value: 5s (5 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| </tcp> |  |

## 13.13      Ping checker - <ping>

By default, there is a wait
action on the module when the ping checker detects a ping failure on a device.

Since SafeKit 8.2.3, the action can be
configured using the action attribute of the <ping> tag.

|  |  |
| --- | --- |
| Commentaire important contour | Insert the <ping> tag into the <check> section if this one is already defined. |

### 13.13.1  <ping> example

<check>

  <ping
ident="testR2" action="wait">

    <to
addr="R2"/>

  </ping>

</check>

·        
The resource associated with the checker is
named ping.testR2 (with the prefix ping.)

·        
The generated failover rule, which performs a wait when
the resource goes down, is named
p\_testR2 (with the prefix p\_) and is equivalent
to:

p\_testR2: if (ping.testR2== down) then wait();

For a description of checkers, refer to section 13.11.3.

 

|  |  |
| --- | --- |
| Sous-titres contour | See also the example in section 15.6. It presents the configuration via the web console along with the corresponding userconfig.xml. |

### 13.13.2  <ping> syntax

  <ping

   
ident="ping\_checker\_name"

    [when="pre|prim|second|both"]

   
[action="wait|stop|stopstart|restart|noaction"]

  >

    <to

    
addr="IP address or name to check"

    
[interval="10s"]

    
[timeout="5s"]

     />

  </ping>

|  |  |
| --- | --- |
| Commentaire important contour | The <ping> tag and full subtree can be changed with a dynamic configuration. |

 

|  |  |
| --- | --- |
| Commentaire important contour | Since SafeKit 8.2.3, use the action attribute to define the action to be taken when an error is detected by the ping checker.  Before SafeKit 8.2.3, the action was static and defined by the default failover rule that applies to all tcp class resources:  ping\_failure: if (ping.? == down) then wait(); |

### 13.13.3  <ping> attributes

|  |  |
| --- | --- |
| <ping | Set as many ping sections as there are ping checkers. |
| ident="*ping\_checker\_name*" | Ping checker name.  It defines the resource associated with the checker:  **ping**.*ping\_checker\_name* (with the prefix ping.) |
| when="pre"  action="wait|noaction" | Use this value to test an external device before the application starts.  ·         when="pre"  The checker starts after/ends before the execution of the prestart/poststop scripts  Since SafeKit 8.2.3, you can configure the action to be taken in case of error detection with:  ·         action="wait"  wait on the module. The name of the associated failover rule is **t\_***tcp\_checker\_name* (with the prefix t\_)  ·         action="noaction"  No failover rule generated. The action must be explicitly written in the <failover> tag (see section 13.19).  Default value: when="pre" action="wait" |
| when="prim|second|both"  action="stop|stopstart|restart|noaction" | Use this value to test a device after the application has started:  ·         when="prim" for a mirror module  The checker is started after/stopped before the execution of the start\_prim/stop\_prim scripts.  ·         when="both" for a farm module  The checker is started after/stopped before the execution of the start\_both/stop\_both scripts.  ·         when="second" for a mirror module  The checker is started after/stopped before the execution of the start\_second/stop\_second scripts.  Since SafeKit 8.2.3, you can configure the action to take when an error is detected with:  ·         action="stop|stopstart|restart"  stop, stopstart or restart the module. The name of the associated failover rule is **p\_***ping\_checker\_name* (with the prefix p\_)  ·         action="noaction"  No action is generated automatically. The action must be explicitly written in the <failover> tag (see section 13.19). |
|  |  |
| <to |  |
| addr="*IP address or name*" | External IP address or name to check.  IPv4 or IPv6 address. |
| [interval="10s"] | Interval in seconds between two ping requests.  Default value: 10s (10 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| [timeout="5s"] | Reply timeout in seconds to the ping.  Default value: 5s (5 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| </ping> |  |

## 13.14      Interface checker - <intf>

By default, there is a wait
action on the module when the intf checker detects a failure on the interface.

|  |  |
| --- | --- |
| Commentaire important contour | Insert the <intf> tag into the <check> section if this one is already defined. |

### 13.14.1  <intf> example

<check>

  <intf
ident="test\_eth0">

    <to
local\_addr="192.168.1.10"/>

  </intf>

</check>

·        
The resource associated with the checker is
named intf.test\_eth0 (with the prefix intf.)

·        
The failover rule, which performs a wait when
an intf class resource goes down, is static and defined by the
default failover rule:

interface\_failure: if
(intf.? == down) then wait();

For a description of checkers, refer to section 13.11.3.

 

|  |  |
| --- | --- |
| Sous-titres contour | See also the example in section 15.10. It presents the configuration via the web console along with the corresponding userconfig.xml. |

### 13.14.2  <intf> syntax

  <intf

   
ident="intf\_checker\_name"

   
[when="pre"]

       >

    <to

    
local\_addr="interface\_physical\_IP\_address"/>

  </intf>

### 13.14.3  <intf> attributes

|  |  |  |  |
| --- | --- | --- | --- |
| <intf | |  |  | | --- | --- | | Commentaire, ajouter contour | <intf> sections are automatically generated on network interface when <interface check="on"> is set (see the virtual IP definition in section 13.6). | |
| ident="*intf\_checker\_name*" | Interface checker name.  It defines the resource associated with the checker:  **intf**.*intf\_checker\_name* (with the prefix intf.) |
| [when="pre"] | Fixed value.  ·         when="pre"  The checker starts after/ends before the execution of the prestart/poststop scripts  In case of error detection, the action is wait. The name of the failover rule, **interface\_failure**, is static and predefined. |
| <to local\_addr="IP addess" /> | Physical IP address configured on the network interface to check.  IPv4 or IPv6 address. |
| </intf> |  |

