Release notes for Gluster 3.13.0

This is a major release that includes a range of features enhancing usability; enhancements to GFAPI for developers and a set of bug fixes.

The most notable features and changes are documented on this page. A full list of bugs that have been addressed is included further below.

Major changes and features

Addition of summary option to the heal info CLI

Notes for users: The Gluster heal info CLI now has a 'summary' option displaying the statistics of entries pending heal, in split-brain and currently being healed, per brick.

Usage:

# gluster volume heal <volname> info summary

Sample output:

Brick <brickname>
Status: Connected
Total Number of entries: 3
Number of entries in heal pending: 2
Number of entries in split-brain: 1
Number of entries possibly healing: 0

Brick <brickname>
Status: Connected
Total Number of entries: 4
Number of entries in heal pending: 3
Number of entries in split-brain: 1
Number of entries possibly healing: 0

Using the --xml option with the CLI results in the output in XML format.

NOTE: Summary information is obtained in a similar fashion to detailed information, thus time taken for the command to complete would still be the same, and not faster.

Addition of checks for allowing lookups in AFR and removal of 'cluster.quorum-reads' volume option.

** Notes for users:**

Previously, AFR has never failed lookup unless there is a gfid mismatch. This behavior is being changed with this release, as a part of fixing Bug#1515572.

Lookups in replica-3 and arbiter volumes will now succeed only if there is quorum and there is a good copy of a file. I.e. the lookup has to succeed on quorum #bricks and at least one of them has to be a good copy. If these conditions are not met, the operation will fail with the ENOTCONN error.

As a part of this change the cluster.quorum-reads volume option is removed, as lookup failure will result in all subsequent operations (including reads) failing, which makes this option redundant.

Ensuring this strictness also helps prevent a long standing rename-leading-to-dataloss Bug#1366818, by disallowing lookups (and thus renames) when a good copy is not available.

Note: These checks do not affect replica 2 volumes where lookups works as before, even when only 1 brick is online.

Further reference: mailing list discussions on topic

Support for max-port range in glusterd.vol

Notes for users:

Glusterd configuration provides an option to control number of ports that can be used by gluster daemons on a node.

The option is named "max-port" and can be set in the glusterd.vol file per-node to the desired maximum.

Prevention of other processes accessing the mounted brick snapshots

Notes for users: Snapshot of gluster bricks are now only mounted when the snapshot is active, or when these are restored. Prior to this snapshots of gluster volumes were mounted by default across the entire life-cycle of the snapshot.

This behavior is transparent to users and managed by the gluster processes.

Enabling thin client

Notes for users: Gluster client stack encompasses the cluster translators (like distribution and replication or disperse). This is in addition to the usual caching translators on the client stacks. In certain cases this makes the client footprint larger than sustainable and also incurs frequent client updates.

The thin client feature, moves the clustering translators (like distribute and other translators below it) and a few caching translators to a managed protocol endpoint (called gfproxy) on the gluster server nodes, thus thinning the client stack.

Usage:

# gluster volume set <volname> config.gfproxyd enable

The above enables the gfproxy protocol service on the server nodes. To mount a client that interacts with this end point, use the --thin-client mount option.

Example:

# glusterfs --thin-client --volfile-id=<volname> --volfile-server=<host> <mountpoint>

Limitations: This feature is a technical preview in the 3.13.0 release, and will be improved in the upcoming releases.

Ability to reserve back-end storage space

Notes for users: Posix translator is enhanced with an option that enables reserving disk space on the bricks. This reserved space is not used by the client mounts thus preventing disk full scenarios, as disk expansion or cluster expansion is more tedious to achieve when back-end bricks are full.

When the bricks have free space equal to or lesser than the reserved space, mount points using the brick would get ENOSPC errors.

The default value for the option is 1(%) of the brick size. If set to 0(%) this feature is disabled. The option takes a numeric percentage value, that reserves up to that percentage of disk space.

Usage:

# gluster volume set <volname> storage.reserve <number>

List all the connected clients for a brick and also exported bricks/snapshots from each brick process

Notes for users: Gluster CLI is enhanced with an option to list all connected clients to a volume (or all volumes) and also the list of exported bricks and snapshots for the volume.

Usage:

# gluster volume status <volname/all> client-list

Improved write performance with Disperse xlator

Notes for users: Disperse translator has been enhanced to support parallel writes, that hence improves the performance of write operations when using disperse volumes.

This feature is enabled by default, and can be toggled using the boolean option, 'disperse.parallel-writes'

Disperse xlator now supports discard operations

Notes for users: This feature enables users to punch hole in files created on disperse volumes.

Usage:

# fallocate  -p -o <offset> -l <len> <file_name>

Included details about memory pools in statedumps

Notes for users: For troubleshooting purposes it sometimes is useful to verify the memory allocations done by Gluster. A previous release of Gluster included a rewrite of the memory pool internals. Since these changes, statedumps did not include details about the memory pools anymore.

This version of Gluster adds details about the used memory pools in the statedump. Troubleshooting memory consumption problems is much more efficient again.

Limitations: There are currently no statistics included in the statedump about the actual behavior of the memory pools. This means that the efficiency of the memory pools can not be verified.

Gluster APIs added to register callback functions for upcalls

Notes for developers: New APIs have been added to allow gfapi applications to register and unregister for upcall events. Along with the list of events interested, applications now have to register callback function. This routine shall be invoked asynchronously, in gluster thread context, in case of any upcalls sent by the backend server.

int glfs_upcall_register (struct glfs *fs, uint32_t event_list,
                          glfs_upcall_cbk cbk, void *data);
int glfs_upcall_unregister (struct glfs *fs, uint32_t event_list);

libgfapi header files include the complete synopsis about these APIs definition and their usage.

Limitations: An application can register only a single callback function for all the upcall events it is interested in.

Known Issues: Bug#1515748 GlusterFS server should be able to identify the clients which registered for upcalls and notify only those clients in case of such events

Gluster API added with a `glfs_mem_header` for exported memory

Notes for developers: Memory allocations done in libgfapi that return a structure to the calling application should use GLFS_CALLOC() and friends. Applications can then correctly free the memory by calling glfs_free().

This is implemented with a new glfs_mem_header similar to how the memory allocations are done with GF_CALLOC() etc. The new header includes a release() function pointer that gets called to free the resource when the application calls glfs_free().

The change is a major improvement for allocating and free'ing resources in a standardized way that is transparent to the libgfapi applications.

Provided a new xlator to delay fops, to aid slow brick response simulation and debugging

Notes for developers: Like error-gen translator, a new translator that introduces delays for FOPs is added to the code base. This can help determine issues around slow(er) client responses and enable better qualification of the translator stacks.

For usage refer to this test case.

Major issues

Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are expanded or possibly contracted (i.e add/remove bricks and rebalance) there are reports of VM images getting corrupted.
- The last known cause for corruption (Bug #1515434) has a fix with this release. As further testing is still in progress, the issue is retained as a major issue.
- Status of this bug can be tracked here, #1515434

Bugs addressed

Bugs addressed since release-3.12.0 are listed below.

#1248393: DHT: readdirp fails to read some directories.
#1258561: Gluster puts PID files in wrong place
#1261463: AFR : [RFE] Improvements needed in "gluster volume heal info" commands
#1294051: Though files are in split-brain able to perform writes to the file
#1328994: When a feature fails needing a higher opversion, the message should state what version it needs.
#1335251: mgmt/glusterd: clang compile warnings in glusterd-snapshot.c
#1350406: [storage/posix] - posix_do_futimes function not implemented
#1365683: Fix crash bug when mnt3_resolve_subdir_cbk fails
#1371806: DHT :- inconsistent 'custom extended attributes',uid and gid, Access permission (for directories) if User set/modifies it after bringing one or more sub-volume down
#1376326: separating attach tier and add brick
#1388509: gluster volume heal info "healed" and "heal-failed" showing wrong information
#1395492: trace/error-gen be turned on together while use 'volume set' command to set one of them
#1396327: gluster core dump due to assert failed GF_ASSERT (brick_index < wordcount);
#1406898: Need build time option to default to IPv6
#1428063: gfproxy: Introduce new server-side daemon called GFProxy
#1432046: symlinks trigger faulty geo-replication state (rsnapshot usecase)
#1443145: Free runtime allocated resources upon graph switch or glfs_fini()
#1445663: Improve performance with xattrop update.
#1451434: Use a bitmap to store local node info instead of conf->local_nodeuuids[i].uuids
#1454590: run.c demo mode broken
#1457985: Rebalance estimate time sometimes shows negative values
#1460514: [Ganesha] : Ganesha crashes while cluster enters failover/failback mode
#1461018: Implement DISCARD FOP for EC
#1462969: Peer-file parsing is too fragile
#1467209: [Scale] : Rebalance ETA shows the initial estimate to be ~140 days,finishes within 18 hours though.
#1467614: Gluster read/write performance improvements on NVMe backend
#1468291: NFS Sub directory is getting mounted on solaris 10 even when the permission is restricted in nfs.export-dir volume option
#1471366: Posix xlator needs to reserve disk space to prevent the brick from getting full.
#1472267: glusterd fails to start
#1472609: Root path xattr does not heal correctly in certain cases when volume is in stopped state
#1472758: Running sysbench on vm disk from plain distribute gluster volume causes disk corruption
#1472961: [GNFS+EC] lock is being granted to 2 different client for the same data range at a time after performing lock acquire/release from the clients1
#1473026: replace-brick failure leaves glusterd in inconsistent state
#1473636: Launch metadata heal in discover code path.
#1474180: [Scale] : Client logs flooded with "inode context is NULL" error messages
#1474190: cassandra fails on gluster-block with both replicate and ec volumes
#1474309: Disperse: Coverity issue
#1474318: dht remove-brick status does not indicate failures files not migrated because of a lack of space
#1474639: [Scale] : Rebalance Logs are bulky.
#1475255: [Geo-rep]: Geo-rep hangs in changelog mode
#1475282: [Remove-brick] Few files are getting migrated eventhough the bricks crossed cluster.min-free-disk value
#1475300: implementation of fallocate call in read-only xlator
#1475308: [geo-rep]: few of the self healed hardlinks on master did not sync to slave
#1475605: gluster-block default shard-size should be 64MB
#1475632: Brick Multiplexing: Brick process crashed at changetimerecorder(ctr) translator when restarting volumes
#1476205: [EC]: md5sum mismatches every time for a file from the fuse client on EC volume
#1476295: md-cache uses incorrect xattr keynames for GF_POSIX_ACL keys
#1476324: md-cache: xattr values should not be checked with string functions
#1476410: glusterd: code lacks clarity of logic in glusterd_get_quorum_cluster_counts()
#1476665: [Perf] : Large file sequential reads are off target by ~38% on FUSE/Ganesha
#1476668: [Disperse] : Improve heal info command to handle obvious cases
#1476719: glusterd: flow in glusterd_validate_quorum() could be streamlined
#1476785: scripts: invalid test in S32gluster_enable_shared_storage.sh
#1476861: packaging: /var/lib/glusterd/options should be %config(noreplace)
#1476957: peer-parsing.t fails on NetBSD
#1477169: AFR entry self heal removes a directory's .glusterfs symlink.
#1477404: eager-lock should be off for cassandra to work at the moment
#1477488: Permission denied errors when appending files after readdir
#1478297: Add NULL gfid checks before creating file
#1478710: when gluster pod is restarted, bricks from the restarted pod fails to connect to fuse, self-heal etc
#1479030: nfs process crashed in "nfs3_getattr"
#1480099: More useful error - replace 'not optimal'
#1480445: Log entry of files skipped/failed during rebalance operation
#1480525: Make choose-local configurable through volume-set command
#1480591: [Scale] : I/O errors on multiple gNFS mounts with "Stale file handle" during rebalance of an erasure coded volume.
#1481199: mempool: run-time crash when built with --disable-mempool
#1481600: rpc: client_t and related objects leaked due to incorrect ref counts
#1482023: snpashots issues with other processes accessing the mounted brick snapshots
#1482344: Negative Test: glusterd crashes for some of the volume options if set at cluster level
#1482906: /var/lib/glusterd/peers File had a blank line, Stopped Glusterd from starting
#1482923: afr: check op_ret value in __afr_selfheal_name_impunge
#1483058: [quorum]: Replace brick is happened when Quorum not met.
#1483995: packaging: use rdma-core(-devel) instead of ibverbs, rdmacm; disable rdma on armv7hl
#1484215: Add Deepshika has CI Peer
#1484225: [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance
#1484246: [PATCH] incorrect xattr list handling on FreeBSD
#1484490: File-level WORM allows mv over read-only files
#1484709: [geo-rep+qr]: Crashes observed at slave from qr_lookup_sbk during rename/hardlink/rebalance cases
#1484722: return ENOSYS for 'non readable' FOPs
#1485962: gluster-block profile needs to have strict-o-direct
#1486134: glusterfsd (brick) process crashed
#1487644: Fix reference to readthedocs.io in source code and elsewhere
#1487830: scripts: mount.glusterfs contains non-portable bashisms
#1487840: glusterd: spelling errors reported by Debian maintainer
#1488354: gluster-blockd process crashed and core generated
#1488399: Crash in dht_check_and_open_fd_on_subvol_task()
#1488546: [RHHI] cannot boot vms created from template when disk format = qcow2
#1488808: Warning on FreeBSD regarding -Wformat-extra-args
#1488829: Fix unused variable when TCP_USER_TIMEOUT is undefined
#1488840: Fix guard define on nl-cache
#1488906: Fix clagn/gcc warning for umountd
#1488909: Fix the type of 'len' in posix.c, clang is showing a warning
#1488913: Sub-directory mount details are incorrect in /proc/mounts
#1489432: disallow replace brick operation on plain distribute volume
#1489823: set the shard-block-size to 64MB in virt profile
#1490642: glusterfs client crash when removing directories
#1490897: GlusterD returns a bad memory pointer in glusterd_get_args_from_dict()
#1491025: rpc: TLSv1_2_method() is deprecated in OpenSSL-1.1
#1491670: [afr] split-brain observed on T files post hardlink and rename in x3 volume
#1492109: Provide brick list as part of VOLUME_CREATE event.
#1492542: Gluster v status client-list prints wrong output for multiplexed volume.
#1492849: xlator/tier: flood of -Wformat-truncation warnings with gcc-7.
#1492851: xlator/bitrot: flood of -Wformat-truncation warnings with gcc-7.
#1492968: CLIENT_CONNECT event is not notified by eventsapi
#1492996: Readdirp is considerably slower than readdir on acl clients
#1493133: GlusterFS failed to build while running make
#1493415: self-heal daemon stuck
#1493539: AFR_SUBVOL_UP and AFR_SUBVOLS_DOWN events not working
#1493893: gluster volume asks for confirmation for disperse volume even with force
#1493967: glusterd ends up with multiple uuids for the same node
#1495384: Gluster 3.12.1 Packages require manual systemctl daemon reload after install
#1495436: [geo-rep]: Scheduler help needs correction for description of --no-color
#1496363: Add generated HMAC token in header for webhook calls
#1496379: glusterfs process consume huge memory on both server and client node
#1496675: Verify pool pointer before destroying it
#1498570: client-io-threads option not working for replicated volumes
#1499004: [Glusterd] Volume operations fail on a (tiered) volume because of a stale lock held by one of the nodes
#1499159: [geo-rep]: Improve the output message to reflect the real failure with schedule_georep script
#1499180: [geo-rep]: Observed "Operation not supported" error with traceback on slave log
#1499391: [geo-rep]: Worker crashes with OSError: [Errno 61] No data available
#1499393: [geo-rep] master worker crash with interrupted system call
#1499509: Brick Multiplexing: Gluster volume start force complains with command "Error : Request timed out" when there are multiple volumes
#1499641: gfapi: API needed to set lk_owner
#1499663: Mark test case ./tests/bugs/bug-1371806_1.t as a bad test case.
#1499933: md-cache: Add additional samba and macOS specific EAs to mdcache
#1500269: opening a file that is destination of rename results in ENOENT errors
#1500284: [geo-rep]: Status shows ACTIVE for most workers in EC before it becomes the PASSIVE
#1500346: [geo-rep]: Incorrect last sync "0" during hystory crawl after upgrade/stop-start
#1500433: [geo-rep]: RSYNC throwing internal errors
#1500649: Shellcheck errors in hook scripts
#1501235: [SNAPSHOT] Unable to mount a snapshot on client
#1501317: glusterfs fails to build twice in a row
#1501390: Intermittent failure in tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t on NetBSD
#1502253: snapshot_scheduler crashes when SELinux is absent on the system
#1503246: clean up port map on brick disconnect
#1503394: Mishandling null check at send_brick_req of glusterfsd/src/gf_attach.c
#1503424: server.allow-insecure should be visible in "gluster volume set help"
#1503510: [BitRot] man page of gluster needs to be updated for scrub-frequency
#1503519: default timeout of 5min not honored for analyzing split-brain files post setfattr replica.split-brain-heal-finalize
#1503983: Wrong usage of getopt shell command in hook-scripts
#1505253: Update .t test files to use the new tier commands
#1505323: When sub-dir is mounted on Fuse client,adding bricks to the same volume unmounts the subdir from fuse client
#1505325: Potential use of NULL this variable before it gets initialized
#1505527: Posix compliance rename test fails on fuse subdir mount
#1505663: [GSS] gluster volume status command is missing in man page
#1505807: files are appendable on file-based worm volume
#1506083: Ignore disk space reserve check for internal FOPS
#1506513: stale brick processes getting created and volume status shows brick as down(pkill glusterfsd glusterfs ,glusterd restart)
#1506589: Brick port mismatch
#1506903: Event webhook should work with HTTPS urls
#1507466: reset-brick commit force failed with glusterd_volume_brickinfo_get Returning -1
#1508898: Add new configuration option to manage deletion of Worm files
#1509789: The output of the "gluster help" command is difficult to read
#1510012: GlusterFS 3.13.0 tracker
#1510019: Change default versions of certain features to 3.13 from 4.0
#1510022: Revert experimental and 4.0 features to prepare for 3.13 release
#1511274: Rebalance estimate(ETA) shows wrong details(as intial message of 10min wait reappears) when still in progress
#1511293: In distribute volume after glusterd restart, brick goes offline
#1511768: In Replica volume 2*2 when quorum is set, after glusterd restart nfs server is coming up instead of self-heal daemon
#1512435: Test bug-1483058-replace-brick-quorum-validation.t fails inconsistently
#1512460: disperse eager-lock degrades performance for file create workloads
#1513259: NetBSD port
#1514419: gluster volume splitbrain info needs to display output of each brick in a stream fashion instead of buffering and dumping at the end
#1515045: bug-1247563.t is failing on master
#1515572: Accessing a file when source brick is down results in that FOP being hung
#1516313: Bringing down data bricks in cyclic order results in arbiter brick becoming the source for heal.
#1517692: Memory leak in locks xlator
#1518257: EC DISCARD doesn't punch hole properly
#1518512: Change GD_OP_VERSION to 3_13_0 from 3_12_0 for RFE https://bugzilla.redhat.com/show_bug.cgi?id=1464350
#1518744: Add release notes about DISCARD on EC volume

Release notes for Gluster 3.13.0

Major changes and features

Addition of summary option to the heal info CLI

Addition of checks for allowing lookups in AFR and removal of 'cluster.quorum-reads' volume option.

Support for max-port range in glusterd.vol

Prevention of other processes accessing the mounted brick snapshots

Enabling thin client

Ability to reserve back-end storage space

List all the connected clients for a brick and also exported bricks/snapshots from each brick process

Improved write performance with Disperse xlator

Disperse xlator now supports discard operations

Included details about memory pools in statedumps

Gluster APIs added to register callback functions for upcalls

Gluster API added with a glfs_mem_header for exported memory

Provided a new xlator to delay fops, to aid slow brick response simulation and debugging

Major issues

Bugs addressed

Gluster API added with a `glfs_mem_header` for exported memory