Troubleshooting Geo-replication
This section describes the most common troubleshooting scenarios related to GlusterFS Geo-replication.
Locating Log Files
For every Geo-replication session, the following three log files are associated to it (four, if the slave is a gluster volume):
- Master-log-file - log file for the process which monitors the Master volume
- Slave-log-file - log file for process which initiates the changes in slave
- Master-gluster-log-file - log file for the maintenance mount point that Geo-replication module uses to monitor the master volume
- Slave-gluster-log-file - is the slave's counterpart of it
Master Log File
To get the Master-log-file for geo-replication, use the following command:
gluster volume geo-replication config log-file
For example:
# gluster volume geo-replication Volume1 example.com:/data/remote_dir config log-file
Slave Log File
To get the log file for geo-replication on slave (glusterd must be running on slave machine), use the following commands:
-
On master, run the following command:
# gluster volume geo-replication Volume1 example.com:/data/remote_dir config session-owner 5f6e5200-756f-11e0-a1f0-0800200c9a66
Displays the session owner details.
-
On slave, run the following command:
# gluster volume geo-replication /data/remote_dir config log-file /var/log/gluster/${session-owner}:remote-mirror.log
-
Replace the session owner details (output of Step 1) to the output of Step 2 to get the location of the log file.
/var/log/gluster/5f6e5200-756f-11e0-a1f0-0800200c9a66:remote-mirror.log
Rotating Geo-replication Logs
Administrators can rotate the log file of a particular master-slave
session, as needed. When you run geo-replication's log-rotate
command, the log file is backed up with the current timestamp suffixed
to the file name and signal is sent to gsyncd to start logging to a new
log file.
To rotate a geo-replication log file
-
Rotate log file for a particular master-slave session using the following command:
# gluster volume geo-replication log-rotate
For example, to rotate the log file of master
Volume1
and slaveexample.com:/data/remote_dir
:# gluster volume geo-replication Volume1 example.com:/data/remote_dir log rotate log rotate successful
-
Rotate log file for all sessions for a master volume using the following command:
# gluster volume geo-replication log-rotate
For example, to rotate the log file of master
Volume1
:# gluster volume geo-replication Volume1 log rotate log rotate successful
-
Rotate log file for all sessions using the following command:
# gluster volume geo-replication log-rotate
For example, to rotate the log file for all sessions:
# gluster volume geo-replication log rotate log rotate successful
Synchronization is not complete
Description: GlusterFS geo-replication did not synchronize the data completely but the geo-replication status displayed is OK.
Solution: You can enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication. After restarting, GlusterFS geo-replication begins synchronizing all the data. All files are compared using checksum, which can be a lengthy and high resource utilization operation on large data sets.
Issues in Data Synchronization
Description: Geo-replication display status as OK, but the files do not get synced, only directories and symlink gets synced with the following error message in the log:
[2011-05-02 13:42:13.467644] E [master:288:regjob] GMaster: failed to
sync ./some\_file\`
Solution: Geo-replication invokes rsync v3.0.0 or higher on the host and the remote machine. You must verify if you have installed the required version.
Geo-replication status displays Faulty very often
Description: Geo-replication displays status as faulty very often with a backtrace similar to the following:
2011-04-28 14:06:18.378859] E [syncdutils:131:log\_raise\_exception]
\<top\>: FAIL: Traceback (most recent call last): File
"/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
152, in twraptf(\*aa) File
"/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in
listen rid, exc, res = recv(self.inf) File
"/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in
recv return pickle.load(inf) EOFError
Solution: This error indicates that the RPC communication between the master gsyncd module and slave gsyncd module is broken and this can happen for various reasons. Check if it satisfies all the following pre-requisites:
- Password-less SSH is set up properly between the host and the remote machine.
- If FUSE is installed in the machine, because geo-replication module mounts the GlusterFS volume using FUSE to sync data.
- If the Slave is a volume, check if that volume is started.
- If the Slave is a plain directory, verify if the directory has been created already with the required permissions.
- If GlusterFS 3.2 or higher is not installed in the default location
(in Master) and has been prefixed to be installed in a custom
location, configure the
gluster-command
for it to point to the exact location. - If GlusterFS 3.2 or higher is not installed in the default location
(in slave) and has been prefixed to be installed in a custom
location, configure the
remote-gsyncd-command
for it to point to the exact place where gsyncd is located.
Intermediate Master goes to Faulty State
Description: In a cascading set-up, the intermediate master goes to faulty state with the following log:
raise RuntimeError ("aborting on uuid change from %s to %s" % \\
RuntimeError: aborting on uuid change from af07e07c-427f-4586-ab9f-
4bf7d299be81 to de6b5040-8f4e-4575-8831-c4f55bd41154
Solution: In a cascading set-up the Intermediate master is loyal to the original primary master. The above log means that the geo-replication module has detected change in primary master. If this is the desired behavior, delete the config option volume-id in the session initiated from the intermediate master.