---
canonical: https://safekit.evidian.com/wp-content/uploads/downloads_safekit/version-82/safekituserguidehtml/documentation/safekituserguideen.htm
---

# 4. Tests

![*](safekituserguideen_fichiers/image001.png)       Section 4.1 “Installation and tests after boot”

![*](safekituserguideen_fichiers/image001.png)       Section 4.2 “Tests of a mirror module”

![*](safekituserguideen_fichiers/image001.png)       Section 4.3 “Tests of a farm module”

![*](safekituserguideen_fichiers/image001.png)       Section 4.4 “Tests of checkers common to mirror and farm”

 

The
following tests help to better understand how SafeKit works and ensure that the
deployed solution returns the expected results. They can be used as a basis for
the acceptance testing at a client's site.

Subsequently, analysis of test results may
require consulting the module log, the scripts log (which contains the output
of module scripts) and the state of module resources. To read these logs and
resources, see section 7.4.

## 4.1             Installation and tests after boot

### 4.1.1         Test package installation

Replace below node1 by the node
name and *AM* by the module name.

·        
safekit -p executed on the nodes returns among other values, the value of SAFE, the
SafeKit root installation path, and SAFEVAR, the SafeKit working directory:

o    in Windows

SAFE=C:\safekit if %SYSTEMDRIVE%=C:  
SAFEVAR=C:\safekit\var

o    in Linux  
SAFE="/opt/safekit"

SAFEVAR="/var/safekit"

For details, see
section 10.1.

·        
Editing **userconfig.xml** of
a mirror(/farm) module and its scripts **start\_prim****/start\_both**, **stop\_prim****/stop\_both** is made with:

o   
the web console at /console/en/configuration/modules/*AM*/config

o    under the directory SAFE/modules/*AM* on the node1

·        
Module log and scripts log (that contains module scripts output) for the module on one node
may be analyzed with:

o    the web console at /console/en/monitoring/nodes/node1/modules/*AM*/logs

o    the command executed on node1   
safekit logview -m *AM*
for the module log

o   
on node1, into files SAFEVAR/modules/*AM*/userlog\_<year>\_<month>\_<day>T<time>\_<script
name>.ulog for the scripts logs (output messages of
the scripts)

 

 

### 4.1.2         Test license and version

safekit
level returns:

Host:
<hostname>  
OS: <OS version>  
SafeKit: <SafeKit version>  
License: Demo (No license)| Invalid Product | Invalid Host | … Expiration… |
<license id> for <hostname>…  
or License: Expired license

·        
"Demo (No license)"

means no license
into SAFE/conf/; the product stops every 3 days

·        
"Invalid Product"

means an expired
license in SAFE/conf/license.txt

·        
"Invalid Host"

means no valid
hostname in SAFE/conf/license.txt

·        
" …Expiration…"

 means a temporary
key

·        
"<license id> for
<hostname>"

means a permanent
license

Click here to get a temporary key of one month
for any OS or any hostname.

Click here  to get a permanent key based on the hostname and OS.

### 4.1.3         Test SafeKit services and modules after boot

In Windows, see also section
10.4.

Test safeadmin service

safeadmin service must be automatically
started at boot. To check its state:

|  |  |
| --- | --- |
| Windows | 1.    Open a PowerShell console as administrator  2.    Run Get-Service -name safeadmin  Status   Name               DisplayName  ------   ----               -----------  Running  safeadmin          safeadmin |
| Linux | 1.    Open a Shell console as root  2.    Run systemctl status safeadmin  Redirecting to /bin/systemctl status safeadmin.service  ● safeadmin.service - The SafeKit Administration Daemon       Loaded: loaded (/usr/lib/systemd/system/safeadmin.service; enabled; vendor preset: disabled)       Active: active (running) since Tue 2024-11-12 17:30:56 CET; 20h ago  … |

When safeadmin service is
not running, all safekit commands fail and return for example:

safekit
level

Waiting
for safeadmin ..........  
Error: safeadmin administrator daemon not running

 

Refer to section
9.1.1, for starting safeadmin
service.

Test safewebserver service

By default, safewebserver
service must be automatically started at boot. To check its state:

|  |  |
| --- | --- |
| Windows | 1.    Open a PowerShell console as administrator  2.    Run Get-Service -name safewebserver  Status   Name               DisplayName  ------   ----               -----------  Running  safewebserver      safewebserver |
| Linux | 1.    Open a Shell console as root  2.    Run systemctl status safewebserver  systemctl status safewebserver  Redirecting to /bin/systemctl status safewebserver.service  ● safewebserver.service - SafeKit Apache Server       Loaded: loaded (/usr/lib/systemd/system/safewebserver.service; enabled; vendor preset: disabled)       Active: active (running) since Wed 2024-11-13 11:01:31 CET; 2h 58min ago … |

 

When safewebserver service is not running, the following
features are unavailable:

·        
the SafeKit web console that displays:

![](safekituserguideen_fichiers/image163.jpg)

·        
the module checker

·        
the distributed command line interface that
returns for example:

safekit -H "\*" level

----------------
Server=https://10.0.0.107:9453 ----------------

curl: (7)
Failed to connect to 10.0.0.107 port 9453 after 1022 ms: Couldn't connect to
server

----------------
Server=https://10.0.0.108:9453 ----------------

curl: (28)
Failed to connect to 10.0.0.108 port 9453 after 21024 ms: Couldn't connect to
server

 

Refer to section 9.1.2, for starting
safewebserver service.

Test SNMP service

SNMP monitoring is not enabled by default.
Refer to section 10.11, to enable
it.

In Windows, it relies on Net-SNMP Agent service. In Linux, it relies on the standard snmpd
service. To check its state:

|  |  |
| --- | --- |
| Windows | 1.    Open a PowerShell console as administrator  2.    Run Get-Service -name "Net-SNMP Agent"  Status   Name               DisplayName  ------   ----               -----------  Running  Net-SNMP Agent      Net-SNMP Agent |
| Linux | 1.    Open a Shell console as root  2.    Run systemctl status snmpd  systemctl status snmpd  Redirecting to /bin/systemctl status snmpd.service  ● snmpd.service - …  Active: active (running) since Wed 2024-11-13 11:01:31 CET; 2h 58min ago  … |

 

When the service is not running, the SNMP
monitoring is unavailable.

 

Refer to section 9.1.4, for starting the service.

Test application modules

·        
safekit boot
status displays start-up ("on") or not
("off") of modules at boot

·        
safekit state displays state of all configured modules: STOP (mirror or
farm), WAIT (mirror or farm), ALONE (mirror), PRIM (mirror), SECOND (mirror), UP (farm)

·        
check processes of a module: see section 10.2.

To list the
processes of the *AM* module, execute:

safekit -r processtree list all *AM*

This
command returns all processes with *AM* in arguments.

·        
safekit module
listid displays name of installed modules with their
ids: id of a module must be the same on all servers

### 4.1.4         Test start of SafeKit web console

For details on the web console, refer to section 3.

·        
connect a web browser to http://host:9010

·        
the web console home page is displayed

## 4.2             Tests of a mirror module

### 4.2.1         Test first start of a mirror module on 2 servers STOP (NotReady)

On the first
start of the module after its configuration:

·        
message in the logs of both servers (to read
logs, see section 7.4)

"Action start called by
admin@<IP>via<IP>/SYSTEM/root"

·        
the module goes to state  ![](safekituserguideen_fichiers/image165.png)WAIT (NotReady)and
![](safekituserguideen_fichiers/image165.png) WAIT (NotReady)on
both servers with in the log

"Action wait from failover rule
rfs\_notuptodate\_server"   
"Data may be not uptodate for replicated directories (wait for the start
of the remote server)" 

"If you are sure that this server has valid data, run
safekit stop, then safekit prim to force start as primary"

 

For the first start of a mirror module with
replicated directories, the user must force the start as primary the node with
the uptodate data. Refer to section 5.3.

### 4.2.2         Test start of a mirror module on 2 servers STOP (NotReady)

For subsequent starts:

·        
message in the logs of both servers (to read
logs, see section 7.4)

"Action start called by
admin@<IP>via<IP>/SYSTEM/root"

·        
the module goes to the stable state  ![](safekituserguideen_fichiers/image167.png)PRIM (Ready)and
![](safekituserguideen_fichiers/image167.png)SECOND (Ready)on
both servers with in the first log

"Remote state SECOND
Ready"   
"Local state PRIM Ready " 

·        
and in the other log

"Local state SECOND Ready "  
"Remote state PRIM Ready "

·        
application is started in the start\_prim
script of the module on the PRIM server with message in the log

"Script start\_prim"

### 4.2.3         Test stop of a mirror module on the server PRIM (Ready)

On the stopping node:

·        
message in the log of the stopped node (to read
logs, see section 7.4)

"Action stop called by
admin@<IP>/SYSTEM/root"

·        
the stopped node runs the stop\_prim
script of the module which stops the application on the server with message in
the log:

"Script stop\_prim"

·        
the module becomes ![](safekituserguideen_fichiers/image166.png)STOP (NotReady)
with messages in the log:

"Local state STOP
NotReady"

 

On the other node:

·        
the node runs a failover with the message in the
log:

"Action alone called by heart: remote
stop"

·        
the application is started with the start\_prim
script with the message in the log:

"Script start\_prim"

·        
the module becomes ![](safekituserguideen_fichiers/image168.png)ALONE (Ready)
with the message in the log:

"Local
state ALONE Ready"

### 4.2.4         Test start of a mirror module on the server STOP (NotReady)

Start the module on a node while
the other node is ![](safekituserguideen_fichiers/image170.png) ALONE (Ready).

·        
message in the log of the starting module (to
read logs, see section 7.4)

"Action start called by
admin@<IP>/SYSTEM/root" 

·        
the ![](safekituserguideen_fichiers/image169.png)STOP (NotReady) module becomes ![](safekituserguideen_fichiers/image167.png)SECOND (Ready)

·        
the module ![](safekituserguideen_fichiers/image170.png)ALONE (Ready)
becomes ![](safekituserguideen_fichiers/image170.png)PRIM (Ready)
and continues to execute the application

### 4.2.5         Test restart of a mirror module on the server PRIM (Ready)

·        
message in the log of the server where the
restart command is passed (to read logs, see section 7.4)

"Action restart
called by admin@<IP>/SYSTEM/root"

·        
the PRIM module becomes ![](safekituserguideen_fichiers/image172.png)PRIM (Transient) and then becomes ![](safekituserguideen_fichiers/image173.png)PRIM (Ready)

·        
the scripts of the module stop\_prim/start\_prim are executed on the PRIM and restarts locally the
application on the server with messages in the log:

"Script stop\_prim"  
"Script start\_prim" 

·        
the other module on the other server stays ![](safekituserguideen_fichiers/image174.png)SECOND (Ready)

### 4.2.6         Test virtual IP address of a mirror module

The test below uses the arp
command specific to IPv4. For an IPv6 VIP or issues with VIP ↔ MAC
resolution, see section 7.24.

|  |  |
| --- | --- |
| Mirror module in the state PRIM (Ready) on node1 and SECOND (Ready) on node2.  userconfig.xml:  <vip>   <interface\_list>   <interface check="on">    <real\_interface>     <virtual\_addr addr="virtip"                where="one\_side\_alias"               check="on"/>    </real\_interface>    </interface>   </interface\_list>  </vip>  1.    On an external workstation (or server) in the same LAN, ping both physical IP addresses + virtual IP address:  ping node1\_ip\_address  ping node2\_ip\_address  ping virtip  arp -a  2.    safekit stopstart -m *AM* on the primary server (where *AM* is the module name)  3.    On the external workstation (or server)  ping node1\_ip\_address  ping node2\_ip\_address  ping virtip  arp -a  Note: redo the ping to virtip before looking at the ARP table because the entry may be marked obsolete and refreshes only after ping | 1.    On node1, ipconfig /all (Windows) or ip addr show (Linux) returns virtip as an alias on the network interface.    On the external workstation (or server), the 3 pings respond.    On the external workstation (or server) in the same LAN, virtip is mapped to the same MAC address as node1\_ip\_address  arp -a  node1\_ip\_address        00-0c-29-0a-5c-fc  node2\_ip\_address        00-0c-29-26-44-93  virtip      00-0c-29-0a-5c-fc  2.    After the stopstart, SECOND (Ready) on node1 server and PRIM (Ready) on node2 server    In the verbose log of new primary, message:  "Virtual IP <virtip of mirror> set"  3.    On node2, ipconfig /all (Windows) or ip addr show (Linux) returns virtip as an alias on the network interface    On the external workstation (or server), the 3 pings respond.    On the external workstation (or server), virtip is mapped to the same MAC address as node2\_ip\_address  arp -a  node1\_ip\_address        00-0c-29-0a-5c-fc  node2\_ip\_address        00-0c-29-26-44-93  virtip       00-0c-29-26-44-93 |

 

### 4.2.7         Test file replication of a mirror module

|  |  |
| --- | --- |
| Mirror module in the state PRIM (Ready) on node1 server and SECOND (Ready) on node2 server.  userconfig.xml in Windows:  <rfs>    <replicated dir="c:\replicated" mode="read\_only" />  </rfs>  userconfig.xml in Linux:  <rfs>    <replicated dir="/replicated" mode="read\_only" />  </rfs>     1.    On the server PRIM (Ready), go to /replicated and create a file file1.txt  2.    On the server SECOND (Ready), go to /replicated and try to delete file1.txt  3.    Stop the server PRIM (Ready) and wait for STOP (NotReady). Then go to the other server which is ALONE (Ready) and create a new file file2.txt  4.    Restart the server STOP (NotReady) and wait for SECOND (Ready). | 1.    file1.txt has been replicated on  SECOND (Ready) under /replicated  2.    Failure because the /replicated directory is read-only on the server   SECOND (Ready)  3.    file2.txt is not replicated in /replicated of the server STOP (NotReady)  4.    file2.txt is reintegrated on the restarted server. During the phase of reintegration, the server is SECOND (Transient)    In the log of reintegrated Linux server, message:  "Updating directory tree from /replicated\_For\_SafeKit\_Replication"   In the log of reintegrated Windows server, message:  "Updating directory tree from c:\replicated"      And at the end of /replicated reintegration, if at least 1 file with modified data has been reintegrated from primary server to secondary server, message  "Copied <reintegration statistics>"   "Reintegration ended (synchronize)"   This message gives statistics for the reintegrated directory: reintegrated size, number of files, time, and throughput on the network in KB/sec.    Note: reintegrate a file larger than 100 MB to have reliable statistics    At the end of reintegration, the server is SECOND (Ready). |

### 4.2.8         Test shutdown of the server PRIM (Ready)

·        
on Windows, check that the special procedure to
stop modules at shutdown has been applied. Refer to section
10.4.

·        
make a shutdown of the server ![](safekituserguideen_fichiers/image168.png)PRIM (Ready)

·        
check in the log of server ![](safekituserguideen_fichiers/image168.png)SECOND (Ready),
message

"Action alone called by heart: no heartbeat"

·        
the server ![](safekituserguideen_fichiers/image168.png)SECOND (Ready)
becomes ![](safekituserguideen_fichiers/image168.png)ALONE (Ready);
application in the start\_prim script of the module is started on the ALONE server with the
message in the log

"Script start\_prim"

·        
on timeout in the SafeKit console, the old
server ![](safekituserguideen_fichiers/image167.png)PRIM (Ready)
becomes ![](safekituserguideen_fichiers/image178.png)ERROR (connection error)

·        
after reboot of the stopped server, check that
the OS shutdown has really called a shutdown of the module

"Action shutdown called by SYSTEM/root" 

·        
Check that the application stop\_prim
script has been executed with the message

"Script stop\_prim"

·        
And check that the module has been completely
stopped before shutting down the server with the last message

"Local
state STOP NotReady"

·        
after reboot of stopped server, if the module is
started automatically at boot (safekit
boot status), message in the log

"Action start called at boot time"

·        
after a start of the module on the stopped
server, the module becomes  
![](safekituserguideen_fichiers/image168.png)SECOND (Ready) on
this server and ![](safekituserguideen_fichiers/image168.png)PRIM (Ready) on
the other server

### 4.2.9         Test power-off of the server PRIM (Ready)

In the event of a power outage, the module
is not stopped properly as it would be during a server shutdown. Failover is
triggered by the loss of heartbeats rather than by detecting the module stop.

|  |  |
| --- | --- |
| userconfig.xml with 2 heartbeats:  <heart>   <heartbeat name="default" />   <heartbeat name="private"               ident="flow" />  </heart>  Note: If you want to make a test with double simultaneous electrical fault on both servers, check that <rfs async="none"> is set in userconfig.xml. For more information, see section 13.7.3. | ·         in the log of the server   SECOND (Ready), message  "Resource heartbeat.default set to down by heart"  "Resource heartbeat.flow set to down by heart"  "Remote state UNKNOWN"   "Action alone called by heart: no heartbeat"  ·         messages appear within 30 seconds after the power-off (if no specified timeout configured for <heart>)  ·         the server SECOND (Ready) becomes ALONE (Ready); the application in the start\_prim script of the module is restarted on the ALONE server with the message in its log  "Script start\_prim"  ·         on SafeKit console timeout, the former server PRIM (Ready) becomes ERROR (connection error)  ·         after reboot of stopped server, if the module is started automatically at boot (safekit boot status), message in the log  "Action start called at boot time"  ·         after restart of the module on the stopped server, the module becomes SECOND (Ready) on this server and PRIM (Ready) on the other server |

### 4.2.10      Test split-brain with a mirror module

Split-brain occurs in situation of network
isolation between two SafeKit servers. Each server becomes primary ALONE and
runs the application. At return of split-brain, a sacrifice must be made by
shutting down the application on one of the two servers.

|  |  |
| --- | --- |
| Mirror module in the state PRIM (Ready) and SECOND (Ready)  userconfig.xml:  <heart>   <heartbeat name="default" />   <heartbeat name="repli" ident="flow" />  </heart>  +  on Windows to manage the IP conflict on the virtual IP address virtip  <vip>   <interface\_list>    <interface check="on" arpreroute="on">     <real\_interface>      <virtual\_addr addr="192.168.1.10"            where="one\_side\_alias"/>     </real\_interface>    </interface>   </interface\_list>  </vip>  To obtain the split-brain, check that there are no checkers in userconfig.xml that can detect the network isolation: no <interface check="on">, no <ping> checker  1.    disconnect at the same time, networks default and repli  2.    reconnect networks | 1.    After network isolation of both servers, all heartbeats are lost. In the logs of both servers,  "Resource heartbeat.default set to down by heart"  "Resource heartbeat.flow set to down by heart"  "Remote state UNKNOWN"   "Local state ALONE Ready "  Split-brain case: both servers are    ALONE (Ready) and run the application.  2.    When reconnecting networks, sacrifice of one ALONE server: the former SECOND server  Log of the former PRIM not sacrificed:  "Remote state ALONE Ready"  "Split brain recovery: staying alone"  Log of the former SECOND sacrificed:  "Remote state ALONE Ready"  "Split brain recovery: exiting alone"  "Script stop\_prim"  The server performs a stopstart: stop of the application with stop\_prim then reintegration of replicated files from the other server.  In Windows, upon reconnection, a conflict may occur with the virtual IP address, leading to the stop-start of the module.  3.    Come back to the stable state PRIM (Ready) and SECOND (Ready) on both servers as it was before split-brain  Note: situation of split-brain in a mirror module with file replication is not good. Indeed, the sacrifice of the former secondary server causes file reintegration of this server from the primary one and the loss of data stored on the secondary during the split-brain situation.  For this reason, 2 heartbeats on two physically separate networks are recommended. Typically, a cable between the two servers will allow (1) to avoid split-brain with an additional heartbeat network and (2) set the replication flow on a separate network |

### 4.2.11      Continue your mirror module tests with checkers

Go to section 4.4 for tests of checkers.

## 4.3             Tests of a farm module

### 4.3.1         Test start of a farm module on all servers STOP (NotReady)

·        
message in the logs of all servers (to read
logs, see section 7.4)

"Action start called by
admin@<IP>/SYSTEM/root"

·        
the module goes to ![](safekituserguideen_fichiers/image168.png)UP (Ready) on
all servers

·        
the application is started in the start\_both
script of the module on all servers with the message in the log

"Script
start\_both"

### 4.3.2         Test stop of a farm module on one server UP (Ready)

·        
message in the log of the stopped server (to
read logs, see section 7.4)

"Action stop called by
admin@<IP>/SYSTEM/root"

·        
the stopped module runs the stop\_both
script which stops the application on this server and with message in the log

"Script stop\_both"

·        
the stopped module becomes ![](safekituserguideen_fichiers/image180.jpg)STOP (NotReady)
with messages in the log:

"Local state STOP NotReady"

·        
the other servers stay ![](safekituserguideen_fichiers/image181.jpg)UP (Ready)
and continue to run the application

·        
restart the module ![](safekituserguideen_fichiers/image182.jpg)STOP (NotReady)
with the start command

### 4.3.3         Test restart of a farm module on one server UP(Ready)

·        
message in the log of the module where the
restart command is passed (to read logs, see section 7.4)

"Action restart called by
admin@<IP>/SYSTEM/root"

·        
the restarted module becomes ![](safekituserguideen_fichiers/image172.png)UP (Transient) then becomes ![](safekituserguideen_fichiers/image184.jpg)UP (Ready)

·        
the module scripts stop\_both/start\_both
are executed on the server to locally restart the application with messages in
the log

"Script stop\_both"  
"Script start\_both"

### 4.3.4         Test virtual IP address of a farm module

The tests below use the arp
command specific to IPv4. For an IPv6 VIP or issues with VIP ↔ MAC
resolution, see section 7.24.

#### 4.3.4.1      Configuration with vmac\_directed

|  |  |
| --- | --- |
| Farm module in the UP (Ready) state on 2 servers node1 and node2  userconfig.xml with load balancing on the safewebserver service (TCP port 9010):  <farm>  <lan name="default" />  </farm>    <vip>   <interface\_list>    <interface check="on">     <virtual\_interface type="vmac\_directed">      <virtual\_addr addr="virtip" where="alias" check="on"/>    </virtual\_interface>    </interface>   </interface\_list>    <loadbalancing\_list>  <group name="FarmProto">    <rule port="9010" proto="tcp" filter="on\_port"/>  </group>  </loadbalancing\_list>  </vip>  On a remote workstation (or server) in the same LAN, ping of the 2 physical IP addresses + virtual IP + arp -a | ·         In the verbose log of all servers:  "Virtual IP <virtip of farm> set"  ·         On the 2 servers, ipconfig /all (Windows) or ip addr show (Linux) returns virtip as an alias on the network interface.  ·         On a remote workstation (or server), the pings respond, and ip1.20 is mapped with the MAC address of one of the 2 servers:  ping node1\_ip\_address  ping node2\_ip\_address  ping virtip     arp -a   node1\_ip\_address        00-0c-29-0a-5c-fc  node2\_ip\_address        00-0c-29-26-44-93  virtip      00-0c-29-26-44-93 |

 

#### 4.3.4.2      Configuration with vmac\_invisible

|  |  |
| --- | --- |
| Farm module in the UP (Ready) state on 2 servers node1 and node2  userconfig.xml with load balancing on the safewebserver service (TCP port 9010):  <farm>  <lan name="default" />  </farm>    <vip>   <interface\_list>    <interface check="on">     <virtual\_interface type="vmac\_invisible" >      <virtual\_addr addr="virtip" where="alias" check="on"/>    </virtual\_interface>    </interface>   </interface\_list>    <loadbalancing\_list>  <group name="FarmProto">    <rule port="9010" proto="tcp" filter="on\_port"/>  </group>  </loadbalancing\_list>  </vip>  On a remote workstation (or server) in the same LAN, ping of the 2 physical IP addresses + virtual IP + arp -a | ·         In the verbose log of all servers:  "Virtual IP <virtip of farm> set"  ·         On the 2 servers, ipconfig /all (Windows) or ip addr show (Linux) returns virtip as an alias on the network interface.  ·         On a remote workstation (or server), the pings respond. And virtip is mapped with the invisible virtual MAC address:  ping node1\_ip\_address  ping node2\_ip\_address  ping virtip     arp -a   node1\_ip\_address        00-0c-29-0a-5c-fc  node2\_ip\_address        00-0c-29-26-44-93  virtip       5a-fe-c0-a8-38-14  ·         Note: by default, the virtual MAC address is a unicast Ethernet address built with 5A:FE (SAFE) and the virtual IP address in hexadecimal |

### 4.3.5         Test TCP load balancing on a virtual IP address

|  |  |
| --- | --- |
| Farm module in the state   UP (Ready) on the 2 servers node1, node2.  Same load balancing configuration in userconfig.xml as the previous test.  On a remote workstation:  1.    Connect a browser to http://virtip:9010/safekit/mosaic.html, then fill the module name and on Mosaic Test. node1, node2 respond  2.    safekit stop -m *AM* on node2 (where *AM* is the module name). Reload the URL: node1 responds     Special command to check the load balancing bitmap for port 9010 on each node UP (Ready):  safekit -r vip\_if\_ctrl -l  An entry in the bitmap of 256 bits must be 1 on a single server.  Furthermore, the 256 bits are fairly distributed in the bitmaps of all servers UP (Ready) (if no definition of power inside userconfig.xml) | 1.   UP (Ready) on the 2 servers: load balancing of TCP sessions between node1, node2 when loading the URL  o    In the resources of the module, for node1 and node2: FarmProto\_0 50%  o    In the verbose logs of node1 and node2:  "farm  membership: **node1 node2** (group FarmProto\_0)"  "farm load: **128/256** (group FarmProto\_0)"  128/256: 128 bits on 256 are managed by each server  o    safekit -r vip\_if\_ctrl -l on node1 and node2.  With type="vmac\_directed"  Bitmap node1: 01010101:01010101:01010101:01010101:ffffffff:ffffffff:ffffffff:ffffffff  Bitmap node2:  ffffffff:ffffffff:ffffffff:ffffffff:02020202:02020202:02020202:02020202  01 and 02 corresponds to the node numbers that reply.  With type="vmac\_invisible"  Bitmap node1:  00000000:00000000:00000000:00000000:ffffffff:ffffffff:ffffffff:ffffffff  Bitmap node2:  ffffffff:ffffffff:ffffffff:ffffffff:00000000:00000000:00000000:0000000  Bits are fairly distributed between both servers  2.   STOP (NotReady) on node2: TCP sessions served only by node1 when loading the URL  o    In the resources of the module, for node1: FarmProto\_0 100%  o    In the verbose log of node1:  "farm  membership: **node1** (group FarmProto\_0)"  "farm load: **256/256** (group FarmProto\_0)"  256/256: all the bits are managed by node1  o    safekit -r vip\_if\_ctrl -l on node1:  Bitmap:  ffffffff:ffffffff:ffffffff:ffffffff:ffffffff:ffffffff:ffffffff:ffffffff  All the bits are managed by node 1 |

### 4.3.6         Test split-brain with a farm module

Split-brain occurs in case of network
isolation between SafeKit servers.

|  |  |
| --- | --- |
| Farm module is UP (Ready) on the servers node1 and node2.  Same configuration of load balancing in userconfig.xml as the previous test. To get the split-brain, check in userconfig.xml that there are no checkers that can detect isolation: no <interface check="on"> or <ping> checker  On the external workstation:  1.    Connect a browser to http://virtip:9010/safekit/mosaic.html, then click on Mosaic Test. node1 and node2 respond  2.    disconnect the network between node1 and node2. Depending on the location where the external console is, node 1 responds or node 2    or  3.    reconnect the network and connect to URL  Same special command as in the previous test to check the load balancing bitmap for port 9010 on each node UP (Ready) | 1.    before split-brain, state UP (Ready) on node1 and node2:  o    In the resources of the module, for node1 and node2: FarmProto\_0 50%  o    In the verbose logs of node1 and node2:  "farm  membership: **node1 node2** (group FarmProto\_0)"  "farm load: **128/256** (group FarmProto\_0)"  128/256: 128 bits on 256 are managed by each server.  o    safekit -r vip\_if\_ctrl -l on node1 and node2:  With type="vmac\_directed"  Bitmap node1: 01010101:01010101:01010101:01010101:ffffffff:ffffffff:ffffffff:ffffffff  Bitmap node2:  ffffffff:ffffffff:ffffffff:ffffffff:02020202:02020202:02020202:02020202  01 and 02 corresponds to the node numbers that reply.  With type="vmac\_invisible"  Bitmap node1:  00000000:00000000:00000000:00000000:ffffffff:ffffffff:ffffffff:ffffffff  Bitmap node2:  ffffffff:ffffffff:ffffffff:ffffffff:00000000:00000000:00000000:0000000  Bits are fairly distributed between both servers  2.   after isolation of servers, split-brain:  o    In the resources of the module, for node1 and node2: FarmProto\_0 100%  o    In the verbose log of node1:  "farm  membership: **node1** (group FarmProto\_0)"  "farm load: **256/256** (group FarmProto\_0)"  256/256: all the bits are managed by node1  o    In the verbose log of node2:  "farm  membership: **node2** (group FarmProto\_0)"  "farm load: **256/256**  (group FarmProto\_0)"  256/256: all the bits are managed by node2  o    safekit -r vip\_if\_ctrl -l on node1 and node2:  Bitmap:  ffffffff:ffffffff:ffffffff:ffffffff:ffffffff:ffffffff:ffffffff:ffffffff  3.   after split-brain when network is reconnected, the same messages can be found in the log and the same bitmaps as those before split-brain.  Note: the default behavior of farm in situation of split-brain is good. The recommendation is to put in userconfig.xml a monitoring network <lan> </lan> where the virtual IP address is.  The messages in the log and the result of vip\_if\_ctrl are slightly different depending on the type vmac\_directed or vmac\_invisible. |

 

### 4.3.7         Test compatibility of the network with invisible MAC address (vmac\_invisible)

|  |  |
| --- | --- |
| Network prerequisite  A unicast MAC Ethernet address 5a-fe-xx-xx-xx-xx is associated with the virtual IP address virtip of a farm module. It is never presented by SafeKit servers as source Ethernet address (invisible MAC). Switches cannot locate this address. When they follow a packet to the destination MAC address 5a-fe-xx-xx-xx-xx, they must broadcast the packet on all ports of the LAN or VLAN where the virtual IP address is (flooding). All servers in the farm therefore receive packets destined to the virtual MAC address 5a-fe-xx-xx-xx-xx.  Note that this prerequisite does not exist for a mirror module (see section 4.2.6)  Server prerequisite  The packets are captured by Ethernet cards set in promiscuous mode by SafeKit. And the packets are filtered by the module kernel <vip> according to the load balancing bitmap. To make a test, you need network monitor tool.  Network monitoring on Windows 2003 (CD2):  1.    install "Network Monitor Tools" in "Management and Monitoring Tools" (capture only packets in source or destination of the server)  2.    Start / Network Monitor then Capture Filter / Address Pairs / virtip then Capture / Start then "Stop and View" at the end of capture  Network monitoring on Linux:  1.  tcpdump host virtip  capture all network packets | 1.    all servers are UP (Ready)  2.    the network monitoring is started on each server with a filter on virtip  3.    an external workstation sends a single ping to the virtual IP address with ping -n (or -c) 1 virtip  o    1 packet sent and received by all servers  "ICMP: Echo: From extip To virtip"  o    there must be as many packets  "ICMP: Echo Reply: To extip From virtip"  as there are servers UP (Ready)  4.    if it is not the case, check if options restrict the "port flooding" in switches and prevent the broadcast of "ICMP: Echo" to all servers  5.    be careful: the "port flooding" restriction in switches can occur after a certain number of flooding (time, number of KB flooded): the ping test must be repeated during several hours by creating flooding to the virtual IP address  6.    Note: to avoid network monitoring tools, an external Linux console can be used. The Linux ping prints duplicate packets coming from the 2 servers UP (Ready):  ping virtip  64 bytes from ip1.20 icmp\_seq=1  64 bytes from ip1.20 icmp\_seq=1 (DUP!)  64 bytes from ip1.20 icmp\_seq=2  64 bytes from ip1.20 icmp\_seq=2 (DUP!)...  This test may be carried out for several minutes by storing the output of the ping in a file and then ensuring that there was (DUP!) all the time: date > /tmp/ping.txt ; ping virtip >> /tmp/ping.txt |

### 4.3.8         Test shutdown of a server UP (Ready)

·        
on Windows, check that the special procedure to
stop modules at shutdown has been performed. Refer to section
10.4.

·        
make a shutdown of a ![](safekituserguideen_fichiers/image195.jpg)UP (Ready)
server

·        
the other servers stay ![](safekituserguideen_fichiers/image184.jpg)UP (Ready)
and continue to run the application

·        
on timeout in the SafeKit console, the old
server ![](safekituserguideen_fichiers/image168.png)UP (Ready)
becomes ![](safekituserguideen_fichiers/image196.png)ERROR (connection error)

·        
after reboot, check that shutdown of the OS has
called a shutdown of the module

"Action shutdown called by SYSTEM"

·        
Check that the stop\_both script
which stops the application has been executed with the message

"Script stop\_both"

·        
And check that the module has been completely
stopped before stopping the server with the last message

"Local
state STOP NotReady"

·        
after reboot of the stopped server, if the
module is started automatically at boot (safekit boot status),
message in the log

"Action start called at boot time"

·        
after start-up of the module on the stopped
server, the module becomes ![](safekituserguideen_fichiers/image179.png)UP (Ready)
and it executes the start\_both script which restarts the application on this server with the
message in the log

"Script
start\_both"

### 4.3.9         Test power-off of a server UP (Ready)

In the event of a power outage, the module
is not stopped properly as it would be during a server shutdown. Failover is
triggered by the loss of heartbeats rather than by detecting the module stop.

·        
the other servers stay ![](safekituserguideen_fichiers/image181.jpg)UP (Ready)
and continue to run the application

·        
on timeout in the SafeKit console, the old
server ![](safekituserguideen_fichiers/image167.png)UP (Ready)
becomes ![](safekituserguideen_fichiers/image196.png)ERROR (connection error)

·        
after reboot of the stopped server, if the
module is started automatically at boot (safekit boot status),
message in the log

"Action start called at boot time"

·        
after start-up of the module on the stopped
server, the module becomes   
![](safekituserguideen_fichiers/image185.jpg)UP (Ready)
and it executes the start\_both script which restarts the application on this server with the
message in the log

"Script
start\_both"

### 4.3.10      Continue your farm module tests with checkers

Go to section 4.4 for tests of checkers.

