---
canonical: https://safekit.evidian.com/wp-content/uploads/downloads_safekit/version-82/safekituserguidehtml/documentation/safekituserguideen.htm
---

## 13.7          File replication - <rfs>, <replicated>

**For mirror modules only**

File replication (RFS) ensures high
availability, real-time synchronization, and fault tolerance for critical data.

Configuring RFS involves the following
constraints:

·        
In Linux, you must set the same value for uid/gid on
the two nodes for replicating file permissions. When replicating a filesystem
mount point, you must apply a special procedure described in section
13.7.4.2.

·        
In Windows, it is strongly recommended to enable
the USN journal on the drive that contains the replicated directory as
described in section 13.7.4.3.

·        
In Windows, moving a file to the Recycle Bin
using the Delete key in File Explorer is not supported. Only permanent deletion
using Shift + Delete is supported.

·        
Replicated directories are writable only on the
primary node

·        
The replicated directory tree can contain paths
with spaces only on Windows

·        
Hard links and file system transactions in
Windows are not supported

|  |  |
| --- | --- |
| Commentaire important contour | If you install and run several application modules on the same server, the replicated directories must be different for each application module. |

### 13.7.1      <rfs> example

·        
Example in Windows:

<rfs>

   <replicated
dir="c:\safedir"/>

</rfs>

·        
Example in Linux:

<rfs>

   <replicated
dir="/safedir"/>

</rfs>

|  |  |
| --- | --- |
| Sous-titres contour | See also a full example at section 15.1.  For the configuration of a dedicated replication network, refer to section 15.1.2.2. It presents the configuration via the web console along with the corresponding userconfig.xml. |

### 13.7.2      <rfs> syntax

<rfs

    
[acl="on"|"off"]

    
[async="second"|"none"]

 

    
[iotimeout="300s"]

     [roflags="0x10"|"0x10000"]

    
[locktimeout="100s"]

    
[sendtimeout="30s"]

 

    
[nbrei="6"]

    
[ruzone\_blocksize="8388608"]

    
[namespacepolicy="0"|"1"|"3"|"4"]

    
[reitimeout="150s"]

    
[reicommit="0"]

    
[reidetail="on"|"off"]

    
[allocthreshold="0"]

     [nbremconn ="1"]

 

    
[checktime="220000ms"]

    
[checkintv="120s"]

    
[nfsbox\_options="cross"|"nocross"]

    
[scripts="off"]

    
[reiallowedbw="20000"]

    
[syncdelta="0m"]

    
[syncat="synchronization scheduling"]

> 

 
<replicated dir="absolute path of a directory"

 
[mode="read\_only"]

> 

  <tocheck
path="relative path of a file or subdir" />

 
<notreplicated path="relative path of a file or subdir" />

 
<notreplicated regexpath="regular expression on relative path of a file
or subdir" />

  …

 </replicated>

</rfs>

|  |  |
| --- | --- |
| Commentaire, ajouter contour | Only async, nbrei, reitimeout and reidetail attributes of <rfs> tag can be changed with a dynamic configuration. The <flow> tag, describing the replication flow, can also be changed dynamically. |

### 13.7.3      <rfs>, <replicated> attributes

 

|  |  |
| --- | --- |
| <rfs |  |
| [mountoversuffix= "*suffix*"] | **Linux only**  During the module configuration, the replicated directory "/a/dir" is renamed "/a/dir*suffix*". The directory **/a/dir** is created and it is:  ·         a mount point to /a/dir*suffix* when the module is started  ·         a link to "/a/dir*suffix*" when the module is stopped  By default, *suffix* value is “\_For\_SafeKit\_Replication”.      |  |  | | --- | --- | | Commentaire, ajouter contour | If there is a hard failure, then the symbolic link will not be restored. In this case, you must restore the symbolic link manually. |       |  |  | | --- | --- | | Commentaire important contour | **Restriction**  You cannot explicitly specify a root file system as a replicated directory (because of the directory rename that is not allowed across a file system). The work around is described in SK-0030. |       |  |  | | --- | --- | | Commentaire important contour | When the module is started, NEVER ACCESS files in "/a/dir*suffix*", otherwise the modifications will not be replicated, and the system will become inconsistent. ALWAYS ACCESS replicated files through "/a/dir". | |
| [acl=  "on" | "off"] | Setting acl to on activate the replication of ACL on files and directories.  Default value: off   |  |  | | --- | --- | | Commentaire important contour | **Restriction for Windows**  ACL replication will not work if the SYSTEM account does not have the "Full control" access right on all the replicated forest.  File ACLs are replicated literally (as SID values), therefore ACL granted to locally defined users and groups will be meaningless on the remote system.  File encryption and file compression attributes are not supported. | |
| [async=  "second" | "none"] | Setting async mode to second is a way to improve file replication performances: modification operations are cached on the secondary server and the acknowledgements are sent more quickly to the primary server.  ·         async="none"  It ensures more robustness: modification operations are put on disk of the secondary before sending acknowledgement to the primary.  ·         async="second"  In case of double failure at the same time of both PRIM and SECOND servers, if the PRIM server cannot restart, then the SECOND server does not have up-to-date data on its disk. There is data loss if the SECOND server is forced to start as primary with the prim command.  Default value: second   |  |  | | --- | --- | | Commentaire, ajouter contour | This attribute’s value can be changed with a dynamic configuration. | |
| [packetsize] | **Linux only**  Maximum size in bytes for NFS replication packets. It must be lower than the maximum size allowed by the NFS server of both servers. When it is set into the configuration, it is used as mount options for rsize and wsize.  By default, the size is the one of the NFS server. |
| [reipacketsize=  "8388608"] | Maximum size in bytes of reintegration packets.  In Linux, this value must be less or equal to packetsize.  Default value in Linux: value of packetsize if it is set into the configuration and is lower than 8388608; else 8388608  Default value in Windows: 8388608 bytes |
| [ruzone\_blocksize="8388608"] | Size of a zone for the modification bitmap of a file.  It must be a multiple of reipacketsize attribute.  Default value: value of reipacketsize if it is set into the configuration; else 8388608 |
| [iotimeout="300s"] | **Windows only**  IO time out in seconds in the Windows file system filter. If an IO cannot be replicated and if the timeout expires in the filter, then the PRIM server becomes ALONE.  Default value: 300s (300 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | In SafeKit 7.4.0.5, the default value was 12O seconds. | |
| [roflags="0x10"|  "0x10000"] | **Windows only**  ·         roflags="0x10"  To ensure the consistency of the data replicated on the 2 servers, the modification of the replicated directories/files must only take place on the PRIM server. If changes are made on the SECOND server, they are notified in the module log with the identification of the process responsible so that the administrator can correct this anomaly.  ·         roflags="0x10000"  With this flag, since SafeKit 7.4.0.31, the module is also be stopped on the SECOND server.  Default value: 0x10 |
| [locktimeout=  "100s"] | Timeout in seconds for replication requests. If a request cannot be served within this timeout, the PRIM server becomes ALONE.  Default value: 100s (100 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| [sendtimeout=  "30s"] | Since SafeKit > 7.4.0.5  Timeout in seconds for sending TCP packets to the remote node. If a packet cannot be sent within this timeout, the PRIM server becomes ALONE. Increase this value in case of low networks.  Default value: 30s (30 seconds)   |  |  |  | | --- | --- | --- | | Commentaire, ajouter contour | | Time unit supported since SafeKit 8.2.5 (see section 13.1). | | Commentaire, ajouter contour | In SafeKit 7.4.0.5, the default value was 12O seconds. | | |  |  |  | |
| [nbrei="6"] | Number of reintegration threads running in parallel for resynchronizing files.  Default value: 6   |  |  | | --- | --- | | Commentaire, ajouter contour | This attribute’s value can be changed with a dynamic configuration. | | Commentaire, ajouter contour | The default value was 3 before SafeKit 8.2.5.3. | |
| [namespacepolicy="0"|"1"|"3"|"4"] | ·         namespacepolicy="0"  Deactivate the zone reintegration on Windows or Linux  ·         namespacepolicy="1"  In Windows, zone reintegration after reboot when the module has been properly stopped is not active  ·         namespacepolicy="3"  In Windows, it allows zone reintegration after reboot when possible. It activates the USN change journal on the volume containing the replicated directories (see fsutil usn command for creating USN change journal on a volume). Even with this configuration, full reintegration is used instead of zone reintegration when:  o    the USN change journal associated with the volume has been deleted/recreated for administration reasons  o    discontinuity in the USN journal is detected  ·         namespacepolicy="4"  When zone synchronization is not possible (on the first reintegration or when zones are not available), the files that need to be synchronized are fully copied. If this reintegration does not complete, the next one will copy again these files. To avoid this, set namespacepolicy="4". This option also enables USN journal checking in Windows.  Default value: 4 since SafeKit > 7.4.0.5 (not supported in previous releases) |
| [reitimeout=  "150s"] | Timeout in seconds for reintegration requests. The timeout can be increased to avoid reintegration failure on heavy load of the primary server.  Default value: 150s (150 seconds)   |  |  |  | | --- | --- | --- | | Commentaire, ajouter contour | | Time unit supported since SafeKit 8.2.5 (see section 13.1). | | Commentaire, ajouter contour | This attribute’s value can be changed with a dynamic configuration. | | |  |  |  | |
| [reicommit="0"] | **Linux only**  Set reicommit="nb blocks" to commit every (nb blocks)\* reipacketsize when reintegrating one file (in addition to the commit at the end of the copy). This can help to succeed reintegration of big files but slows down reintegration time.  Default value: 0 that means no intermediate commit |
| [reidetail=  "on"|"off"] | Detailed logging for reintegration.  Default value: off   |  |  | | --- | --- | | Commentaire, ajouter contour | This attribute’s value can be changed with a dynamic configuration. | |
| [allocthreshold=  "0"] | **Windows only**  Size in Gb to apply the allocation policy before reintegration.  When allocthreshold> 0, enable fast allocation of disk space for files to be synchronized on the secondary node. This feature avoids a timeout when the primary writes at the end of the file, when the file is large (> 200 Gb) and not yet completely copied.  Since SafeKit 7.4.0.64, the allocation policy has changed and is applied for:  ·         Newly created files (files that did not exist on the secondary when the reintegration starts)  ·         Files with size on the primary >= allocthreshold (size in Go)  ·         Full synchronization on the first reintegration; on start with full synchronization (safekit second|prim fullsync); when synchronization by zones is disabled (namespacepolicy="0")  Default value: 0 (that disables the feature) |
| [nbremconn="1"] | Number of TCP connections between the primary and the secondary nodes.  This value may be increased to improve the replication and synchronization throughput when the network has high latency (in cloud for instance).  Default value: 1 |
| [checktime=  "220000ms"] | **Linux only**  Timeout in milliseconds for the null request that checks the local replicated file system. Run the safekit stopstart command when the timeout is reached.  Default value: 220000ms (220000 milliseconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| [checkintv=  "120s"] | **Linux only**  Interval in seconds between two null requests.  Default value: 120s (120 seconds)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| [nfsbox\_options=  "cross"|"nocross"] | **Windows only**  It specifies the policy to apply when a reparse point of type MOUNT\_POINT is present in the replicated directory tree. This policy applies to all replicated directories.  MOUNT\_POINT reparse points in NTFS can represent two types of objects:  an NTFS mount point (for example the D:\ directory) or an NTFS "directory junction" (a form of "symbolic link" to another part of the file system namespace).  ·          nfsbox\_options="cross"  The MOUNT\_POINT reparse point object itself is not replicated/reintegrated. It is evaluated, and the reintegration/replication process the target content as it would do for the content of a standard directory. This is useful for instance when a replicated directory is a mount point (e.g., replicating a "drive letter" root). This is the default configuration value.  ·          nfsbox\_options="nocross"  The MOUNT\_POINT reparse point object itself is replicated/reintegrated but not evaluated. Reintegration does not descend into the target of the reparse point. This is useful for instance when a replicated directory tree contains NTFS "junctions" that point to another part of the replicated tree (e.g., when replicating a PostgreSQL database, as PostgreSQL is known to need such objects).  Default value: cross |
| [scripts=  "on" | "off"] | scripts="on" activates \_rfs\_\* script callbacks used to implement specific data replication management  Default value: off |
| [reiallowedbw="20000"] | When defined, this attribute specifies the maximum bandwidth that the reintegration phase may use (for instance 20000 KB/s), in kilo bytes per second (KB/s).  Due to implementation trade-off, a +/-10% fluctuation of the effectively used bandwidth is to be expected.   |  |  | | --- | --- | | Commentaire, ajouter contour | The replication bandwidth is not affected by this parameter. |   By default, the attribute is not defined, and the bandwidth used by the reintegration is not limited |
| [syncdelta="0m"] | ·          syncdelta <=1  The attribute is ignored and the default failover and start policy is applied: only an up-to-date server can start as primary or run a failover.  ·          syncdelta >1  It changes the default failover and start policy. The not up-to-date server can become primary but only if the elapsed time, in minutes, since the last synchronization is lower than the syncdelta value (see section 13.7.4.4).  Default value: 0m (0 minute)   |  |  | | --- | --- | | Commentaire, ajouter contour | Time unit supported since SafeKit 8.2.5 (see section 13.1). | |
| [syncat="*synchronization scheduling*"] | Default: real-time replication and automatic synchronization (no scheduling)  Use syncat for scheduling the synchronization of replicated directories on the secondary node (see section 13.7.4.10). The module must be started for enabling this feature. Once synchronized, the module blocks in the WAIT (NotReady) state until the next synchronization.  The scheduling is based on native job scheduler:  ·          On Unix, the job is defined in the safekit user’s crontab  ·          On Windows, the job is defined as a system task  You must configure syncat with the syntax of the native job scheduler. For instance, for synchronizing daily, after midnight:  ·          in Windows  syncat="/SC DAILY /ST 00:01:00"  ·          in Unix  syncat="01 0 \* \* \*"   |  |  | | --- | --- | | Commentaire, ajouter contour | See crontab documentation in Unix and schtasks.exe documentation in Windows, for the full syntax of scheduled date and time. |       |  |  | | --- | --- | | Commentaire important contour | Since SafeKit configuration is just a front end to the job scheduler, when scheduling is not working, please check first for syntax errors. | |
| [<flow name ="network">    [<server  addr="IP\_1" />  <server  addr="IP\_2" /> ]   </flow>] | **Legacy** configuration preserved for backwards compatibility.  When this section is not defined, the replication flow uses the same network as the heartbeat with ident="flow" if there is one, if not it uses the first heartbeat (see section 13.4).   If you define this section, be coherent with heartbeat ident="flow", if there is one, because default failover rules apply to this heartbeat.   |  |  | | --- | --- | | Commentaire, ajouter contour | This <flow> tag subtree can be changed with a dynamic configuration for setting a new replication flow for instance. |   The name attribute of <flow> define the network used for replication flow. It must present in global cluster configuration (see section 12).  The <server> tag is a legacy syntax used in previous SafeKit version (before 7.2). It’s supported for compatibility reason but must not be used for new modules.   |  |  | | --- | --- | | Commentaire important contour | In the same userconfig.xml, you must not use the syntax for SafeKit 7.1 and the one for SafeKit 7.2. | |
| <replicated | Begin the definition of replicated directories.  Set as many lines as there are replicated directories. |
| dir="*abs\_path*" | Absolute path of a directory to replicate.   |  |  | | --- | --- | | Commentaire important contour | Spaces in file paths are supported only on Windows. | |
| [mode=  "read\_only"] | Read-only access rights on the secondary machine for replicated directories to avoid corruption |
| <notreplicated  path="*relative*" /> | Relative path of a file or sub-directory in a replicated directory. The file (or sub-directory) is not replicated. Set as many lines as there are non-replicated files or sub-directories.   |  |  | | --- | --- | | Commentaire important contour | Spaces in file paths are supported only on Windows. | |
| <notreplicated  regexpath="*regular expression*" /> | Regular expression on the name of entries under the replicated directory:  ·          **Replicate all except** entries matching the regular expression. For example, to avoid replicating entries with the extension .tmp or .bak in the /safedir directory or its sub-directories:  <replicated dir="/safedir">   <notreplicated regexpath=".\*\.tmp$" />   <notreplicated regexpath=".\*\.bak$" />  </replicated>  Note that /safedir/conf/config.tmp.swap is replicated.  ·          **Replicate** **only** those entries in the directory that match the regular expression after the **!**  For example, to replicate only entries with the extension .mdf or .ldf in the /safedir directory or its sub-directories:  <replicated dir="/safedir">   <notreplicated regexpath="**!**.\*\.mdf$" />   <notreplicated regexpath="**!**.\*\.ldf$" />  </replicated>   |  |  | | --- | --- | | Commentaire important contour | Rename between not replicated and replicated files is not supported. |   The regex engine is POSIX Extended regex (see POSIX documentation):  ·         in Windows, case insensitive mode  ·         in Linux, case sensitive mode   |  |  | | --- | --- | | Commentaire important contour | As regular expressions are defined inside the XML file userconfig.xml, special characters interpreted by XML like '<' or '>' cannot be used in regular expressions. | |
| <tocheck  path="*relative*" /> | Relative path of a file or sub-directory in a replicated directory. Checks the presence of the file or sub-directory before starting the replication mechanism. Avoids errors such as starting replication on an empty file system. Set as many lines as there are files or sub-directories to check. |

### 13.7.4      <rfs> description

#### 13.7.4.1  <rfs> prerequisites

See file replication prerequisites
described in section 2.2.4.

#### 13.7.4.2  <rfs> Linux

On Linux, interception of data is based on
a local NFS mount. And the replication flow between servers is based on NFS v3
/ TCP protocol.

The NFS mount of replicated directories
from remote Unix clients is not supported. The NFS mount of other directories
can be made with standard commands.

Procedure for replicating a mount point

When replicating a mount point in Linux,
the module configuration fails with the error:

Error:
Device or resource busy

In the following, we take the example of
PostgreSQL module that set as replicated directories /var/lib/pgsql/var
and /var/lib/pgsql/data. The userconfig.xml of the module contains:

<rfs … >

  <replicated
dir="/var/lib/pgsql/var" mode="read\_only" />

  <replicated
dir="/var/lib/pgsql/data" mode="read\_only" />

</rfs>

These directories are mount points as shown
by the result of the command df
-H. It returns for instance:

/dev/mapper/vg01-lv\_pgs\_var … /var/lib/pgsql/var

/dev/mapper/vg02-lv\_pgs\_data … /var/lib/pgsql/data

You must apply the following procedure for
configuring the module to replicate these directories.

1.    umount the file systems by running the commands:

umount /var/lib/pgsql/var

umount /var/lib/pgsql/data

2.    configure the module by running the command:

/opt/safekit/safekit config -m
postgresql

The
configuration should succeed (no errors)

3.    check the symbolic links created by running the command ls -l /var/lib. It returns:

lrwxrwxrwx 1
root var -> var\_For\_SafeKit\_Replication

lrwxrwxrwx 1
root data -> data\_For\_SafeKit\_Replication

4.    edit /etc/fstab and change the two lines:

/dev/mapper/vg01-lv\_pgs\_var /var/lib/pgsql/var ext4…

/dev/mapper/vg02-lv\_pgs\_data /var/lib/pgsql/data ext4…

With

/dev/mapper/vg01-lv\_pgs\_var
/var/lib/pgsql/var\_For\_SafeKit\_Replication ext4…

/dev/mapper/vg02-lv\_pgs\_data /var/lib/pgsql/data\_For\_SafeKit\_Replication
ext4..

5.    mount the file systems by running the commands:

mount
/var/lib/pgsql/var\_For\_SafeKit\_Replication

mount
/var/lib/pgsql/data\_For\_SafeKit\_Replication

|  |  |
| --- | --- |
| Commentaire important contour | ·          Apply this procedure on both nodes if replicated directories are mount point on both nodes. Once applied, you can use the module as usual: i.e., safekit start stop etc …  ·          It is the same procedure for all mounts points that must be replicated |

 

|  |  |
| --- | --- |
| Commentaire, ajouter contour | To protect the start of the module on a non-mounted and empty directory, you can insert in userconfig.xml the checking of a file inside the replicated directory. Example for /var/lib/pgsql/var (do the same for /var/lib/pgsql/data with a file inside this directory which is always present):  <replicated dir="/var/lib/pgsql/var" mode="read\_only">      <tocheck path="postgresql.conf" />  </replicated>. |

 

If you want to unconfigure the module (or
uninstall whole SafeKit package), you must reverse this procedure by:

1.    umount the file systems with:

umount
/var/lib/pgsql/var\_For\_SafeKit\_Replication

umount
/var/lib/pgsql/data\_For\_SafeKit\_Replication

2.    de-configure the module with

/opt/safekit/safekit deconfig -m postgresql

3.    edit /etc/fstab to undo previous editing

4.    mount the file systems with:

mount /var/lib/pgsql/var

mount /var/lib/pgsql/data

#### 13.7.4.3  <rfs> Windows

On Windows, interception of data is based
on a file system filter. And the replication flow between servers is based on
NFS v3 / TCP protocol.

The rfs filter may not
work correctly with some anti-viruses.

On Windows, you can mount remotely a
replicated directory from a workstation. If you want to mount with the virtual
name instead of the digital virtual IP address, you must set the two following
registry keys on the server side:

[HKEY\_LOCAL\_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa]
"DisableLoopbackCheck"=dword:00000001

[HKEY\_LOCAL\_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters]
"DisableStrictNameChecking"=dword:00000001

In Windows, to enable zone reintegration after server
reboot, when the module has been successfully stopped, the <rfs>
component uses the NTFS USN log to verify that the information recorded on the
zones is still valid after the reboot. When the control succeeds, the zone
reintegration can be applied to the file; otherwise, the file must be fully
copied.

By default, only the system drive has a USN log active.
If the replicated directories are located on a different drive than the system
drive, you must create the log (with fsutil usn command).

|  |  |
| --- | --- |
| Sous-titres contour | See SK-0066 for an example. |

#### 13.7.4.4  <rfs> replication and failover

With its
file-replication function, mirror architecture is particularly suitable for
providing high availability for back-end applications with critical data to protect
against failure. The reason is that the secondary
server data is strongly synchronized with the primary server data. A
synchronized server is considered as up-to-date and only an up-to-date server
can start as primary or run a failover.

If the application availability is more critical
than the application data, this default policy can be relaxed by allowing a server
to become primary if the time elapsed since the last synchronization is below a
configurable delay. This is configured by setting the syncdelta attribute
of the <rfs> tag:

·        
syncdelta <= 1

The
attribute is ignored and the default failover and start policy is applied. The
default value is 0.

·        
syncdelta > 1

When the
last up-to-date server is not responding, the not up-to-date server can become
primary but only if the elapsed time since the last synchronization is lower
than the syncdelta value (in minutes).

This feature is implemented with:

·        
rfs.synced resource

When syncdelta
is > 1, the rfs.synced resource is managed. This resource is UP if the replicated
data are consistent and if the elapsed time, in minute since the last
synchronization is lower than the syncdelta value.

·        
syncedcheck checker

When syncdelta is
> 1, this checker is running. It sets the value for the rfs.synced
resource.

·        
rfs\_forceuptodate failover rule

When syncdelta is
> 1, the following failover rule is valid:

rfs\_forceuptodate:        
if (heartbeat.\* == down && cluster() == down && rfs.synced ==
up && rfs.uptodate == down) then rfs.uptodate=up;

This rule
leads to the primary start of the server when the up-to-date server is not
responding and if the server is isolated and can be considered as synchronized
according to syncdelta value.

#### 13.7.4.5  <rfs> replication verification

You can check for the module, named *AM*, that
files are identical on the primary and the secondary, by running the following
command on the SECOND server: safekit
rfsverify -m *AM*. Run safekit rfsverify -m *AM* > log to redirect the command output into the file named log.

This output of the command is a log like
that of the reintegration in which the files to be copied (therefore different)
are indicated. When on the primary, there is activity on the replicated
directories, an anomaly may be detected while there is no difference between
the files in the following cases:

·        
on Windows because modifications are made on
disk before being replicated

·        
with async="second" (default)
because reads can bypass the asynchronous writes.

To check if there is really an
inconsistency, you must re-run the command on the secondary server making sure
that there is no more activity on the primary.

On Windows, some files are systematically
seen as erroneous by the verifier while there is no difference. This occurs
when files are modified with SetvalidData: files are extended without resetting the new extension and the
reads return random data from the disk.

|  |  |
| --- | --- |
| Commentaire, ajouter contour | It is strongly recommended to run this command only when there are no accesses to the replicated directories on the primary. |

#### 13.7.4.6  <rfs> file changes since the last synchronization

Before starting a secondary server, it may
be useful to evaluate the number of files and data that have been changed on
the primary server since the secondary server has stopped. This feature is
provided by running the following command on the ALONE server: safekit rfsdiff -m *AM*. Run safekit
rfsdiff -m *AM* > log to redirect the command output
into the file named log.

This command runs on-line checks of regular
files content of the module *AM*. It scans the entire replicated tree and displays the number of
files that have been modified as well as the size that need to be copied. It
also displays estimation for the synchronization duration. This is only
estimation since only regular files are scanned and some other modifications
may occur until the synchronization is run by the secondary server.

This command must be used with caution on a
production server since it leads to an overhead on the server (for reading
trees and files with locking). On Windows, rename of files can fail during the
evaluation.

|  |  |
| --- | --- |
| Commentaire, ajouter contour | It is strongly recommended to run this command only when there are no accesses to the replicated directories. |

#### 13.7.4.7  <rfs> replication and reintegration bandwidth

The replication component monitors, on the PRIM
server, the bandwidth used by replication and reintegration write requests.

Two resources (rfs.rep\_bandwidth and
rfs.rei\_bandwidth) reflect the average bandwidth used by replication and
reintegration respectively during the last 3 seconds, expressed in kilo bytes
per second (KB/s).

If the replication load is IO intensive,
the reintegration phase may saturate the network link and significantly slow
down the application. In such a case, the <rfs> reiallowedbw attribute may be used to limit the bandwidth taken by the
reintegration phase (see section 13.7.3). Please note that limiting the
reintegration bandwidth will make the reintegration phase longer.

There are also 2 resources that reflect the
network bandwidth (in in Kbytes/sec) used between nfsbox processes, that run on
each node to implement replication and reintegration:

·        
rfs.netout\_bandwidth is the network output bandwidth

·        
rfs.netin\_bandwidth is the network input bandwidth

You can observe the value of rfs.netout\_bandwidth on the primary or rfs.netin\_bandwidth on the secondary to know the modification rate at the time of
observation (write, create, delete, …). The history of the resource values
gives an overview of its evolution over time.

The value of the bandwidth depends on the
application, system, and network activity. Its measurement is available for
information purposes only.

#### 13.7.4.8  <rfs> synchronization by date

SafeKit 7.2 offers a new command safekit secondforce -d date -m *AM* that forces the module *AM* to start as secondary after
copying only files modified after the specified date.

|  |  |
| --- | --- |
| Commentaire important contour | This command must be used with cautions since the synchronization will not copy files modified before the specified date. It is the administrator’s responsibility to ensure that these files are consistent and up to date. |

The date is in the format of YYYY-MM-DD[Z]
or "YYYY-MM-DD hh:mm:ss[Z]" or YYYY-MM-DDThh:mm:ss[Z], where:

- YYYY-MM-DD indicates
  the year, month, and day
- hh:mm:ss indicates
  the hours, minutes, and seconds
- Z indicates that the
  time is in UTC time zone; when not set the time is in local time zone

·        
For instance:

- safekit secondforce -d 2016-03-01 -m *AM* for copying only files modified after the 1st of March 2016
- safekit secondforce -d "2016-03-01 12:00:00" -m *AM* for copying only files modified after the 1st of March 2016 at
  12h, local time zone
- safekit secondforce -d 2016-03-01T12:00:00Z -m *AM* for copying only files modified after the 1st of March 2016 at
  12h, UTC time zone

This command may be useful in the following
case:

·        
the module is stopped on the primary server and
a backup of the replicated data is done (on a removable drive for instance)

·        
the module is stopped on the secondary server
and the replicated data is restored from the backup. It may be the first
start-up or the repair of the secondary server.

·        
the module is started on the primary server that
becomes ALONE

·        
the module is started on the secondary with the
command safekit
secondforce -d date -m *AM* where the date is the
backup date

In this case, only the files modified since
the backup date will be copied (full copy), instead of the full copy of all
files.

|  |  |
| --- | --- |
| Commentaire important contour | In Windows, the file modification date on the secondary server is changed when the file is copied by the synchronization process. Therefore, safekit secondforce -d date -m *AM*, where date is prior to the last reintegration on this server, has no interest. |

#### 13.7.4.9  <rfs> external synchronization

On the first synchronization, all
replicated files are fully copied from the primary node to the secondary node. During
the following synchronizations, necessary when the secondary node comes back, only
zones modified, during the secondary downtime, of files that have been modified
on the primary node during the secondary node downtime. When the replicated
directories are voluminous, the first synchronization can take a lot of time especially
if the network is slow. For this reason, since SafeKit> 7.3.0.11, SafeKit
provides a new feature to synchronize a large amount of data that must be used
in conjunction with a backup tool.

On the primary node, simply back up the
replicated directories and pass the synchronization policy to the external
mode. The backup is transported (using an external drive for instance) and
restored to the secondary node, which is also configured to perform external
synchronization. When the module starts on the secondary node, it copies only
the file areas that were modified on the primary node since the backup

The external synchronization relies on a
new SafeKit command safekit
rfssync that must be applied on both nodes to set the
synchronization policy to external. This command requires arguments:

·        
the role of the node (prim | second)

·        
a unique identifier (uid)

External synchronization procedure

The external synchronization procedure,
described below, is the procedure to be followed in the case of a cold backup
of the replicated directories. In this case, the application must be stopped,
and any modification of the replicated directories is prohibited until the module,
and the application are started, in ![](safekituserguideen_fichiers/image359.jpg)ALONE(Ready). The order of
operations must be strictly adhered to.

![](safekituserguideen_fichiers/image378.jpg)

 

The external synchronization procedure,
described below, is the procedure to be followed in the case of a hot backup of
replicated directories. In this case, the module is   
![](safekituserguideen_fichiers/image379.jpg)ALONE(Ready); the
application is started and changes to the contents of the replicated
directories are allowed. The order of operations must be strictly adhered to.

 

![](safekituserguideen_fichiers/image380.jpg)

safekit
rfssync command

|  |  |
| --- | --- |
| safekit rfssync external prim *uid*  [-m *AM*] | Set the synchronization policy to external. It is identified by the value of *uid* (at max 24 char).  The node is the primary one, the source for synchronizing data. |
| safekit rfssync external second *uid* [-m *AM*] | Set the synchronization policy to external. It is identified by the value of uid (at max 24 char).  The node is the secondary one, the destination for synchronizing data |
| safekit rfssync -d prim *uid* [-m *AM*]  safekit rfssync -d second uid [-m *AM*] | Disable the replicated directories change detection between the cold backup/restore and the start of the module.   |  |  | | --- | --- | | Commentaire important contour | Use this option with caution since the external synchronization may not properly detect all changes to be copied. | |
| safekit rfssync full [-m *AM*] | Set the synchronization policy to full. This will copy all files in their entirety on the next synchronization. |
| safekit rfssync | Display the current synchronization policy |

 

Internals

The synchronization policy is represented
by module’s resources: usersetting.rfssyncmode, usersetting.rfssyncrole, usersetting.rfssyncuid and rfs.rfssync:

·        
usersetting.rfssyncmode="default"
(usersetting.rfssyncrole="default", usersetting.rfssyncuid="default")

These
values ​​are associated with the standard synchronization policy, which is
applied by default. It consists of copying only the modified areas of the
files. When this policy cannot be applied, the modified files are copied in
their entirety.

·        
usersetting.rfssyncmode="full"
(usersetting.rfssyncrole="default", usersetting.rfssyncuid="default")

These
values ​​are associated with the full synchronization policy. It is
applied:

·        
the first time the module is started after its
first configuration

·        
on safekit commands (safekit second|prim fullsync ; safekit
rfssync full ; safekit primforce ; safekit config ; safekit deconfig)

·        
on change of pairing for the module

The full synchronization
policy will copy all files in their entirety on the
next synchronization.

·        
usersetting.rfssyncmode="external", usersetting.rfssyncrole="prim
| second" and usersetting.rfssyncuid="uid"

These
values ​​are associated with the external synchronization policy assigned
with the commands safekit
rfssync external prim uid and safekit rfssync external second
uid. The next synchronization will apply the external
synchronization policy.

·        
rfs.rfssync="up
| down"

This
resource is only up when the synchronization policy, defined by the previous resources,
can be applied.

When the synchronization policy is not the
default policy, the synchronization policy automatically returns to the default
mode after successful synchronization. To check the
state of resources, see section 7.4.

In some cases, external synchronization cannot
be applied, and the secondary node stops with an error specified in the module
log. In this situation, you must either:

·        
complete the external synchronization procedure
if this has not been done in its entirety on the 2 nodes

·        
fully reapply the external synchronization
procedure on the 2 nodes

·        
revert to the full synchronization
policy (safekit
rfssync full command)

·        
apply the synchronization by date, using the
date of the backup (see section 13.7.4.8). Unlike
external synchronization, synchronization by date will copy the files, modified
on the primary node, in their entirety (instead of just modified parts).

#### 13.7.4.10                       <rfs> scheduled synchronization

By default, SafeKit provides real-time file
replication and automatic synchronization. On heavy loaded server or high
latency network, you may want to let the secondary node weakly synchronized.
For this, you can use the syncat attribute for scheduling replicated directories synchronization on
the secondary node. The module must be started for enabling this feature. Once
synchronized, the module blocks in the WAIT (NotReady)
state until the next synchronization schedule. It is implemented with:

·        
the resource rfs.syncat that is
set to up on the scheduled dates and set to down after the data
synchronization

·        
the failover rule rfs\_syncat\_wait that
blocks the module into the state WAIT (NotReady) until the rfs.syncat
resource is up

If you want to manually force the
synchronization, you can run the command: safekit set -r rfs.syncat -v up -m *AM* while the module is in the WAIT (NotReady) state.

With syncat, you just have
to configure the scheduled time for the synchronization with the syntax of the
native job scheduler:  crontab in Linux and schtasks.exe in Windows (see section 13.7.3).

