---
canonical: https://safekit.evidian.com/wp-content/uploads/downloads_safekit/version-82/safekituserguidehtml/documentation/safekituserguideen.htm
---

# 1. Technical overview

![*](safekituserguideen_fichiers/image001.png)       Section 1.1 “Generalities, solutions, architectures”

![*](safekituserguideen_fichiers/image001.png)       Section 1.2 “The SafeKit mirror cluster”

![*](safekituserguideen_fichiers/image001.png)       Section 1.3 “The SafeKit farm cluster”

![*](safekituserguideen_fichiers/image001.png)       Section 1.4 “Clusters running several modules”

![*](safekituserguideen_fichiers/image001.png)       Section 1.5 “The SafeKit Hyper-V or KVM cluster”

Section 1.6 “A free trial of the KVM cluster with SafeKit is available here.

![*](safekituserguideen_fichiers/image001.png)       SafeKit clusters in the cloud”

## 1.1             Generalities, solutions, architectures

### 1.1.1         Introduction to SafeKit

SafeKit is a high availability software
solution designed to ensure 24/7 uptime for business-critical applications. It
supports both Windows and Linux platforms and eliminates the need for shared
disks, enterprise editions of databases, or advanced technical skills, making
it a cost-effective alternative to traditional clustering solutions.

Key Features:

·        
Real-Time Synchronous Replication: Continuous
data replication across nodes to prevent data loss.

·        
Automatic Failover and Failback: Seamless switch
to a secondary system during failures and reversion once the original system is
operational.

·        
Load Balancing: Optimizes resource use by
distributing workloads across multiple servers.

·        
Platform Agnostic: Compatible with physical
machines, virtual machines, and public cloud infrastructures.

Key Advantages:

·        
Zero Specific Skills: No specialized IT skills
required for deployment.

·        
Zero Hardware Overhead: No need for specific
hardware like shared disks or load balancers.

·        
Zero Software Overhead: Works with standard
editions of Windows and Linux.

Key Solutions:

·        
Application Level: High availability with
restart scripts per application.

·        
Hypervisor Level: High availability without
restart scripts per application.

·        
Container or Pod Level: High availability
without restart scripts per application.

SafeKit
is ideal for software publishers, resellers, and distributors looking to
enhance their products with high availability features. It also offers an OEM
opportunity for partners to integrate SafeKit into their own applications.

### 1.1.2         SafeKit solutions

 See here for a list of SafeKit solutions.

|  |  |
| --- | --- |
| **Application-level HA**  In this type of solution, only application data is replicated. And only the application is restarted in case of a failure.    Integration tasks must be implemented, including defining the list of services to restart in case of failover, specifying data folders for replication, configuring software checkers, and setting up a virtual IP address.  This solution is platform-independent and works with applications inside physical machines, virtual machines, in the cloud. Any hypervisor is supported (e.g., VMware, Hyper-V, etc.). | **Virtual machine-level HA**  In this type of solution, the entire virtual machine (VM) is replicated, including the application and OS. The complete virtual machine is restarted in case of a failure.    The advantage is that there is no need to have in-depth knowledge of the application (services to restart, location of application data to replicate), and no virtual IP address needs to be configured.  This solution works with Windows/Hyper-V and Linux/KVM but not with VMware. This is an active/active solution with multiple virtual machines replicated and restarted between the two nodes. |

Note: Applications
running in containers or pods also do not require dedicated restart scripts.
SafeKit provides generic restarts and real-time replication of persistent data
for these environments (see the list of SafeKit solutions).

### 1.1.3         SafeKit architectures

SafeKit offers two basic high availability
clusters for Windows and Linux:

·        
the mirror cluster, with real-time file
replication and failover, built by deploying a mirror module on 2 servers,

·        
the farm cluster, with network load
balancing and failover, built by deploying a farm module on 2 servers or more.

Several modules can be deployed on the same
cluster. Thus, advanced clustering architectures can be implemented:

·        
the farm+mirror cluster built by deploying a
farm module and a mirror module on the same cluster,

·        
the active/active cluster built by deploying
several mirror modules on 2 servers,

·        
the N-1 cluster built by deploying N mirror
module on N+1 servers.

Specific clusters are also interesting to
consider with SafeKit:

·        
the Hyper-V or KVM cluster with real-time
replication and failover of entire virtual machines between 2 active
hypervisors,

·        
mirror or farm clusters in the Cloud.

### 1.1.4         SafeKit cluster definition

A SafeKit cluster is a set of servers where
SafeKit is installed and running.

All servers within a given SafeKit cluster
share the same cluster configuration, which includes the list of servers and
networks used. These servers communicate with each other to maintain a global
view of the configurations of the SafeKit modules. A server cannot belong
to multiple SafeKit clusters simultaneously.

Configuring the cluster is a prerequisite
before the installation and configuration of SafeKit modules. This can be done
using the SafeKit web console or through online commands.

### 1.1.5         SafeKit application module definition

An application module is a customization of
SafeKit for a specific application or hypervisor. See here for a list of modules and their quick installation guides.

**Types of Modules**

·        
Generic farm and mirror modules for new
applications,

·        
Preconfigured application modules for databases,
web servers…,

·        
Hypervisors modules (hyperv.safe, kvm.safe) for
real-time replication and restart of entire virtual machines.

**Module Contents**

In practice, an application module is a
“.safe” file (zip type) that includes:

·        
The configuration file userconfig.xml, which
contains:

o    The virtual IP address (not necessary for a hypervisor module),

o    File directories to replicate in real time (for a mirror module),

o    Network load balancing criteria (for a farm module),

o    Configuration of software and hardware failures detectors,

·        
The scripts to start and stop an application or
a virtual machine.

**Deployment Steps**

Once the application module is configured
and tested, deployment requires no specific IT skills:

·        
Install the application or the hypervisor on 2
standard servers,

·        
Install the SafeKit software on both servers,

·        
Install the module on both servers.

Configuring, deploying, and monitoring
modules can be done using the SafeKit web console or through online commands.

### 1.1.6         SafeKit limitations

|  |  |  |  |
| --- | --- | --- | --- |
| **Typical usage with SafeKit** | | | |
| Replication of a few Tera-bytes | Replication < 1 million files | Replication <= 32 virtual machines | 1 or 10 G/s LAN or extended LAN |
| **Limitation** | | | |
| Resynchronization after a failure takes too long.  On a 1 Gb/s network, 3 Hours for 1 Tera-bytes.  On a 10 Gb/s network, 1 hour or less for 1 Tera-bytes (depends on write disk IO performances). | Resynchronization after a failure takes too long.  Time to check each file between both nodes. | In full virtual machine replication mode, and with one virtual machine in a mirror module, the limit is 32 modules per cluster. | Failover of the virtual IP address is built-in when in the same subnet.  A LAN provides adequate bandwidth for resynchronization.  A LAN provides adequate latency (typically a round-trip of less than 2ms) for synchronous replication. |
| **Alternative** | | | |
| Use shared storage. | Put files in a virtual hard disk replicated by SafeKit. | Use another HA solution with shared storage. | Use backup solutions with asynchronous replication. |

## 1.2             The SafeKit mirror cluster

### 1.2.1         Real time file replication and application failover

The mirror cluster is an active-passive
high-availability solution, built by deploying a mirror module within a
two-node cluster. The application runs on a primary server and is restarted
automatically on a secondary server if the primary server fails.

With its real-time file replication
function, this architecture is particularly suited to providing high
availability for back-end applications with critical data to protect against
failure.

Microsoft SQL Server, PostgreSQL, MariaDB,
Oracle, Milestone, Nedap, Docker, Podman, Hyper-V, and KVM solutions are
examples of mirror modules. You can create your own mirror module for your
application based on the generic mirror.safe module. See here for a list of modules.

Note that Hyper-V and KVM mirror modules
replicate entire virtual machines, including applications and operating
systems. They do not require a virtual IP, as the VM restart handles the
failover of the VM physical IP address.

The mirror cluster works as follows.

### 1.2.2         Step 1. Normal operation

![](safekituserguideen_fichiers/image012.jpg)

Server 1 (PRIM) runs the application.

SafeKit replicates files opened by the
application. Only changes made by the application in the files are replicated
in real time across the network, thus limiting traffic.

For replication, only names of file
directories to replicate are configured in SafeKit. There are no pre-requisites
on disk organization for the two servers. Directories to replicate may be
located in the system disk.

### 1.2.3         Step 2. Failover

**![](safekituserguideen_fichiers/image013.jpg)**

When Server 1 fails, Server 2 takes over.
SafeKit switches the virtual IP address and restarts the application
automatically on Server 2. The application finds the files replicated by
SafeKit up-to-date on Server 2, thanks to the synchronous replication between
Server 1 and Server 2. The application continues to run on Server 2 by locally
modifying its files that are no longer replicated to Server 1.

The switch-over time is equal to the
fault-detection time (set to 30 seconds by default) plus the application
start-up time. Unlike disk replication solutions, there is no delay for
remounting file systems and running recovery procedures.

### 1.2.4         Step 3. Failback and automatic resynchronization

**![](safekituserguideen_fichiers/image014.jpg)**

Failback involves restarting Server 1
after fixing the problem that caused it to fail. SafeKit automatically
resynchronizes the files, updating only the files modified on Server 2 while
Server 1 was halted.

This automatic reintegration takes place
without stopping the application, which can continue running on Server 2. This
is a major feature that differentiates SafeKit from other solutions, which
require manual operations to reintegrate Server 1 in the cluster.

### 1.2.5         Step 4. Return to normal operation

![](safekituserguideen_fichiers/image015.png)

After reintegration, the files are once
again in mirror mode, as in step 1. The system is back in high-availability
mode, with the application running on Server 2 and SafeKit replicating file
updates to Server 1.

If administrators want the application to
run on Server 1, they can execute a ‘Stop/Start’ command on the PRIM server
either through the console at the appropriate time or automatically by
configuring a default primary server.

### 1.2.6         Synchronous replication versus asynchronous replication

There is a significant difference between
synchronous replication, as offered by the SafeKit mirror solution, and
asynchronous replication traditionally offered by other file replication
solutions.

With synchronous replication, when a disk
IO is performed by the application on the primary server inside a replicated
file, SafeKit waits for the IO acknowledgement from the local disk and from the
secondary server, before sending the IO acknowledgement to the application.
This mechanism is essential for recovery of transactional applications.

The latency of a LAN (typically a
round-trip of less than 2ms) between the servers is required to implement
synchronous data replication, possibly with an extended LAN in two
geographically remote computer rooms.

With asynchronous replication implemented
by other solutions, the IOs are placed in a log on the primary server but the
primary server does not wait for the IO acknowledgments of the secondary
server. Thus, all data that has not been copied over the network to the second
server is lost in the event of a failure of the first server.

In particular, a transactional application
may lose committed data in the event of a failure. Asynchronous replication can
be used for data replication over a low-speed WAN to back up data remotely, but
it is not suitable for high availability with automatic failover.

SafeKit provides a semi-synchronous
solution, implementing the asynchrony not on the primary server but on the
secondary one. In this solution, SafeKit always waits for the acknowledgement
of the two servers before sending the acknowledgement to the application. But
on the secondary, there are 2 options asynchronous or synchronous. In the
asynchronous case, the secondary sends the acknowledgement to the primary upon
receipt of the IO and writes to disk after. In the synchronous case, the
secondary writes the IO to disk and then sends the acknowledgement to the
primary. The synchronous mode is required if we consider a simultaneous double
power outage of two servers, with inability to restart the former primary
server and requirement to re-start on the secondary.

### 1.2.7         Behavior in case of network isolation

A **heartbeat** is a mechanism for
synchronizing two servers and detecting failures by exchanging data over a
shared network. If one server loses all heartbeats, it assumes the other is
down and runs the application ALONE.

SafeKit supports multiple heartbeats across
shared networks. A dedicated network with a second heartbeat can prevent
network isolation and be used as the replication network.

**Network Isolation**:

·        
Upon losing all heartbeats, both servers
transition to the ALONE state, running the application independently.

·        
After the isolation, one server stops and
resynchronizes data from the other server.

·        
The cluster returns to PRIM-SECOND state.

**Splitbrain Checker**:

·        
Uses a witness IP (usually a router) to avoid
double execution during isolation.

·        
Only the server with witness access goes ALONE,
the other waits.

·        
After isolation, the WAIT server resynchronizes
and becomes SECOND.

### 1.2.8         3-node replication

SafeKit only supports replication between
two nodes. However, it is possible to implement 3-node replication by combining
SafeKit with a backup solution.

An application is made highly available
between 2 nodes thanks to SafeKit with its synchronous real-time replication
(no data loss) and automatic failover. Additionally, a backup solution is
implemented for asynchronous replication to a third node in a disaster recovery
site. Since there is data loss with an asynchronous backup solution, the
failover to the third node is manual and decided by an administrator.

Note that the real-time replication of
SafeKit does not eliminate the need for a backup solution. For example, a
ransomware attack encrypting replicated data on the primary server will also
encrypt data on the secondary server in real-time with SafeKit. Only a backup
solution with a retention policy can resolve a ransomware attack. The
administrator must restore the backup from before the ransomware attack.

### 1.2.9         SafeKit on a single node to protect against software failures

You can configure an application module in
"light" mode, which corresponds to a module running on a single node
without synchronizing with other nodes (unlike mirror or farm modules). A light
module includes the start and stop of an application, as well as SafeKit
checkers that detect software errors and perform automatic restarts on a single
node.

The light module interfaces with the
SafeKit console, allowing an administrator to view the status of the
application module and manually trigger application restarts using a
button-click interface.

There is no need to define a virtual IP
address or replicated directories in a light module. Note that this can also
serve as a first step before transitioning to a mirror module or a farm module.

## 1.3             The SafeKit farm cluster

### 1.3.1         Network load balancing and application failover

![](safekituserguideen_fichiers/image016.jpg)

The farm cluster is an active-active
high-availability solution, built by deploying a farm module within a cluster
of two or more nodes. The farm cluster provides both network load balancing,
through transparent distribution of network traffic, and software and hardware
failover. This architecture offers a simple solution to support the increase in
system load.

The same application runs on each server,
and the load is balanced by the distribution of network activity on the
different servers of the farm.

Farm clusters are suited to front-end
applications like web services.

Apache, Microsoft IIS, NGINX solutions are
examples of farm modules. You can write your own farm module for your
application, based on the generic farm.safe module. See here for a list of modules.

### 1.3.2         Principle of a virtual IP address with network load balancing

The virtual IP address is configured
locally on each server in the farm. Input traffic for this address is
distributed among all servers by a filter within each server’s kernel.

The load balancing algorithm inside the
filter is based on the identity of the client packets (client IP address,
client TCP port). Depending on the identity of the client packet, only one
filter on a server accepts the packet. Once a packet is accepted by the filter
on a server, only the CPU and memory of that server are used by the application
responding to the client’s request. The output messages are sent directly from
the application server to the client.

If a server fails, the SafeKit heartbeat
protocol in a farm reconfigures the filters to re-balance the traffic among the
remaining available servers.

### 1.3.3         Load balancing for stateful or stateless web services

With a stateful server, session affinity is
required. The same client must connect to the same server across multiple TCP
sessions to retrieve its context. In this scenario, the SafeKit load balancing
rule is configured on the client IP address. This ensures that the same client
always connects to the same server for multiple TCP sessions, while different
clients are distributed across various servers in the farm. This configuration
is used when session affinity is required.

With a stateless server, there is no
session affinity. The same client can connect to different servers in the farm
across multiple TCP sessions, as no context is stored locally on a server from
one session to another. In this case, the SafeKit load balancing rule is
configured on the TCP client session identity. This configuration is optimal
for distributing sessions between servers but requires a TCP service without
session affinity.

### 1.3.4         Chain high availability solution in a farm

What is a chain HA solution (also known as
a cascading HA solution)?

·        
Multiple servers are linked in a sequence: If
one server fails, the next one in the chain takes over.

·        
Priority-based management: A single server, the
one with the highest priority in the chain and which is available, manages all
requests from clients.

·        
Failover process: If the server with the highest
priority fails, the next available server with the highest priority takes over.

·        
Reintegration: When a server comes back online
and has the highest priority, it resumes handling all client requests.

·        
Quick recovery time: This solution has a quick
recovery time, as the application is pre-started on all servers. The recovery
time is essentially the time needed to reconfigure the priorities among the
servers in the farm (a few seconds).

·        
Replication limitations: This solution does not
support real-time replication, which is limited to mirror architecture.
However, a combined farm+mirror architecture is available.

To implement a chain high availability
solution, SafeKit offers a "power" variable in the load balancing
rules, which is set at the level of each server in the cluster. The power
variable allows you to allocate more or less traffic to a server. When the
power variable is set as a multiple of 64 between servers (e.g., 1, 64, 64\*64,
64\*64\*64, ...), the chain high availability solution is implemented.

## 1.4             Clusters running several modules

### 1.4.1         The SafeKit farm+mirror cluster

**Network load
balancing, file replication and application failover**

You can mix farm and mirror modules on the
same cluster.

This option allows you to implement a
multi-tier application architecture, such as apache\_farm.safe (farm
architecture with load balancing and failover) and postgresql.safe (mirror
architecture with file replication and failover) on the same servers.

![](safekituserguideen_fichiers/image017.jpg)

As a result, load balancing, file
replication and failover are managed coherently on the same servers.

### 1.4.2         The SafeKit active/active cluster with replication

**Crossed
replication and mutual failover**

In an active / active cluster with
replication, there are two servers and two mirror modules in mutual failover
(appli1.safe and appli2.safe). Each application server is backup of the other
server.

![](safekituserguideen_fichiers/image018.jpg)

If one application server fails, both
applications will run on the same physical server. Once the failed server is
restarted, its application will return to its default primary server.

A mutual failover cluster is more
cost-effective than two separate mirror clusters, as it eliminates the need for
backup servers that remain idle most of the time, waiting for a primary server
to fail. However, in the event of a server failure, the remaining server must
be capable of handling the combined workload of both applications.

Note that:

·        
Both applications, Appli1 and Appli2, must be
installed on each server to enable application failover.

·        
This architecture is not limited to just two
applications; N application modules can be deployed across two servers.

·        
Each mirror module will have its own virtual IP
address, its own replicated file directories, and its own restart scripts.

### 1.4.3         The SafeKit N-1 cluster

**Replication and
application failover from N servers to 1**

In an N-1 cluster, N mirror application
modules are deployed across N primary servers and a single backup server.

![](safekituserguideen_fichiers/image019.jpg)

In the event of a failure, unlike in an
active/active cluster, the backup server does not need to manage a double
workload when a primary server fails. This assumes only one failure occurs at a
time. While the solution can support multiple primary server failures
simultaneously, in such cases, the single backup server will need to handle the
combined workload of all the failed servers. In a N-1 cluster, there are N
mirror application modules installed between N primary servers and one backup
server.

Note that:

·        
All applications (Appli1, Appli2, Appli3) must
be installed on the single backup server to enable application failover.

·        
Each mirror module will have its own virtual IP
address, its own replicated file directories, and its own restart scripts.

## 1.5             The SafeKit Hyper-V or KVM cluster

### 1.5.1         Load balancing, replication, failover of entire virtual machines

The Hyper-V or KVM cluster is an example of
an active-active cluster. Multiple applications can be hosted in various
virtual machines, which are replicated and restarted by SafeKit. Each virtual
machine is managed by SafeKit within its own mirror module.

![](safekituserguideen_fichiers/image020.png)

The solution has the following features:

·        
Real-time synchronous replication of entire
virtual machines with failover capabilities.

·        
A centralized, user-friendly SafeKit console for
managing all VMs, including the ability to migrate VMs between servers to
optimize load distribution.

·        
A checker for each VM to detect if it has locked
up, crashed, or ceased to function, and to restart the VM if necessary.

·        
An attractive solution that requires no
application integration.

·        
A robust architecture suitable for
high-availability solutions that cannot be integrated at the application level.

A free trial of the Hyper-V cluster with SafeKit is available here.

A free trial of the KVM cluster with SafeKit is available here.

## 1.6             SafeKit clusters in the cloud

For a full description, refer to section 16.

### 1.6.1         Mirror cluster in Azure, AWS and GCP

SafeKit delivers high-availability clusters
with real-time replication and failover in Azure, AWS, and GCP through the
deployment of a mirror module.

![](safekituserguideen_fichiers/image021.png)

The mirror solution in the cloud is similar
to the on-premise one, except that the virtual IP address must be configured at
the load balancer level:

·        
Virtual machines are placed in different
availability zones, which are in different subnets.

·        
The critical application runs on the primary
server.

·        
Users connect to a primary/secondary virtual IP
address managed by the cloud load balancer.

·        
SafeKit provides a health check configured in
the load balancer. On the primary server, the health check returns OK to the
load balancer, while it returns nothing on the secondary server. Thus, all
requests to the virtual IP address are routed to the primary server.

·        
If the primary server fails or is stopped, the
secondary server automatically becomes the primary one and returns OK to the
health check. Thus, all requests to the virtual IP address are rerouted to the
new primary server.

·        
SafeKit monitors the critical application on the
primary server using SafeKit checkers.

·        
SafeKit automatically restarts the critical
application in the event of software or hardware failure, thanks to restart
scripts.

·        
SafeKit performs synchronous real-time
replication of files containing critical data.

For more information, refer to mirror cluster in Azure, mirror cluster in AWS or mirror cluster in GCP.

### 1.6.2         Farm cluster in Azure, AWS and GCP

SafeKit delivers high-availability clusters
with network load balancing and failover in Azure, AWS, and GCP through the
deployment of a farm module.

![](safekituserguideen_fichiers/image022.png)

The farm solution in the cloud is similar
to the on-premise one, except that the virtual IP address must be configured at
the load balancer level:

·        
Virtual machines are placed in different
availability zones, which are in different subnets.

·        
The critical application runs on all servers.

·        
Users are connected to a virtual IP address
managed by the cloud load balancer.

·        
SafeKit provides a health check configured in
the load balancer. The health check returns OK on all servers running the
application.

·        
If a server fails or is stopped, the checker
returns nothing to the load balancer, which then stops routing requests to that
server.

·        
SafeKit monitors the critical application on all
servers using SafeKit checkers.

·        
SafeKit automatically restarts the critical
application on a server when there is a software failure, thanks to restart
scripts.

For more information, refer to farm cluster in Azure, farm cluster in AWS or farm cluster in GCP.