Glusterfs: Difference between revisions
Line 1: | Line 1: | ||
== .gluster folder == | == .gluster folder cleanup == | ||
.glusterfs creates a hardlink for each of the file present in the brick. The contents of .glusterfs must not be touched unless you know very well what you are doing. Its apparent size is not real. It mostly contains hard-links to other files, so the net space usage contribution for this directory is very small.<ref>https://github.com/gluster/glusterfs/issues/833</ref> | .glusterfs creates a hardlink for each of the file present in the brick. The contents of .glusterfs must not be touched unless you know very well what you are doing. Its apparent size is not real. It mostly contains hard-links to other files, so the net space usage contribution for this directory is very small.<ref>https://github.com/gluster/glusterfs/issues/833</ref> | ||
Line 8: | Line 8: | ||
# find .glusterfs -type f -links 1 #check if there are any files that have just 1 link | # find .glusterfs -type f -links 1 #check if there are any files that have just 1 link | ||
# find .glusterfs -type f | wc -l | # find .glusterfs -type f | wc -l | ||
</syntaxhighlight> | |||
[https://icicimov.github.io/blog/high-availability/GlusterFS-orphaned-GFID-hard-links/ Igor Cicimov] describes well how to get rid of GlusterFS orphaned GFID hard links <syntaxhighlight lang="bash"> | |||
# find /path-to-bric/.glusterfs -type f -links -2 -exec rm -fv {} \; | |||
</syntaxhighlight> | </syntaxhighlight> | ||
== GlusterFS Performance Tuning == | == GlusterFS Performance Tuning == |
Revision as of 09:29, 30 July 2023
.gluster folder cleanup
.glusterfs creates a hardlink for each of the file present in the brick. The contents of .glusterfs must not be touched unless you know very well what you are doing. Its apparent size is not real. It mostly contains hard-links to other files, so the net space usage contribution for this directory is very small.[1]
If you use 'du' on the entire volume, you will get the actual used space because 'du' already takes into account hard links.
From the root of the brick, run this:
# du -sh * .glusterfs
# find .glusterfs -type f -links 1 #check if there are any files that have just 1 link
# find .glusterfs -type f | wc -l
Igor Cicimov describes well how to get rid of GlusterFS orphaned GFID hard links
# find /path-to-bric/.glusterfs -type f -links -2 -exec rm -fv {} \;
GlusterFS Performance Tuning
performance tuning, there are no magic values for these which work on all systems. The defaults in GlusterFS are configured at install time to provide best performance over mixed workloads. To squeeze performance out of GlusterFS, use an understanding of the below parameters and how them may be used in your setup.
After making a change, be sure to restart all GlusterFS processes and begin benchmarking the new values.[2]
General Commands
Check Status
# gluster peer status
# gluster --remote-host=nod1 peer status
# gluster pool list
Prove peer
# gluster peer probe <IP>
Detach peer
# gluster peer detach <node>
# gluster peer detach <node> force
Volume
# gluster volume create <Volume name> transport tcp <brick path>
# gluster volume create ucmvolume nod1:/blok1 nod2:/blok1
# gluster volume list
# gluster volume get <Volume name> all
Option Value ------ ----- cluster.lookup-unhashed on ...
# gluster volume reset <Volume name> nfs.disable
# gluster volume set ucmvolume nfs.disable on
# gluster volume status <Volume name>
# gluster volume start <Volume name>
# gluster volume info <Volume name>
Remove bricks
root@nod1 :~# gluster volume remove-brick ucmvolume nod1:/blok1 start volume remove-brick start: success ID: c6ab64f7-d921-4e07-9350-0524b7d2a613 root@nod1 :~# gluster volume remove-brick ucmvolume nod1:/blok1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 completed 0.00 root@nod1 :~# gluster volume remove-brick ucmvolume nod1:/blok1 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success
# mount -t glusterfs nod1:/ucmvolume /mnt
# mount -t glusterfs nod2:/ucmvolume /mnt -o backupvolfile-server=nod1
# mount -t glusterfs nod2,nod1:/ucmvolume /mnt
# gluster snapshot list# gluster snapshot status
# gluster snapshot info
Snap Name : build1_GMT-2015.05.26-14.19.01 Snap UUID : 3b3b8e45-5c81-45ff-8e82-fea05b1a516a Brick Path : nod1:/run/gluster/snaps/25fa0c73a6b24327927c0dc9f4f08dba/brick1/data Volume Group : ThinGroup Brick Running : No Brick PID : N/A Data Percentage : 40.80 LV Size : 80.00g Brick Path : nod2:/run/gluster/snaps/25fa0c73a6b24327927c0dc9f4f08dba/brick2/data Volume Group : ThinGroup Brick Running : No Brick PID : N/A Data Percentage : 40.75 LV Size : 80.00g Brick Path : nod3:/run/gluster/snaps/25fa0c73a6b24327927c0dc9f4f08dba/brick3/data Volume Group : ThinGroup Brick Running : No Brick PID : N/A Data Percentage : 40.82 LV Size : 80.00g
# gluster snapshot activate build1_GMT-2015.05.26-14.19.01
# gluster snapshot deactivate build1_GMT-2015.05.26-14.19.01
# mount -t glusterfs localhost:/snaps/build1_GMT-2015.05.26-14.19.01/buildinglab /mnt
root@nod1~# gluster snapshot clone build-new build1_GMT-2015.05.26-14.19.01 snapshot clone: success: Clone build-new created successfully root@nod1~# mount ... /dev/mapper/ThinGroup-build--new_0 on /srv/buildinglab type btrfs (rw,relatime,nodatasum,nodatacow,space_cache,autodefrag) /dev/mapper/ThinGroup-build--new_0 on /run/gluster/snaps/25fa0c73a6b24327927c0dc9f4f08dba/brick2 type btrfs (rw,relatime,nodatasum,nodatacow,space_cache,autodefrag) /dev/mapper/ThinGroup-build--new_0 on /run/gluster/snaps/build-new/brick2 type btrfs (rw,relatime,nodatasum,nodatacow,space_cache,autodefrag) ...
# gluster volume heal <Volume name> enable
# gluster volume heal git info
GlusterFS Configuration
GlusterFS volumes can be configured with multiple settings. These can be set on a volume using the below command substituting [VOLUME] for the volume to alter, [OPTION] for the parameter name and [PARAMETER] for the parameter value.
gluster volume set [VOLUME] [OPTION] [PARAMETER]
Example:
gluster volume set myvolume performance.cache-size 1GB
Or you can add the parameter to the glusterfs.vol config file.
vi /etc/glusterfs/glusterfs.vol
- performance.write-behind-window-size – the size in bytes to use for the per file write behind buffer. Default: 1MB.
- performance.cache-refresh-timeout – the time in seconds a cached data file will be kept until data revalidation occurs. Default: 1 second.
- performance.cache-size – the size in bytes to use for the read cache. Default: 32MB.
- cluster.stripe-block-size – the size in bytes of the unit that will be read from or written to on the GlusterFS volume. Smaller values are better for smaller files and larger sizes for larger files. Default: 128KB.
- performance.io-thread-count – is the maximum number of threads used for IO. Higher numbers improve concurrent IO operations, providing your disks can keep up. Default: 16.
Tunning Points[3]
When mounting your storage for the GlusterFS later, make sure it is configured for the type of workload you have.
Enable Metadata cache
- Enable metadata caching and cache invalidation:
gluster volume set <volname> group metadata-cache
This group command enables caching of stat and xattr information of a file or directory. The caching is refreshed every 10 min, and cache-invalidation is enabled to ensure cache consistency.
- To increase the number of files that can be cached, execute the following command:
gluster volume set <volname> network.inode-lru-limit <n>
n, is set to 50000. It can be increased if the number of active files in the volume is very high. Increasing this number increases the memory footprint of the brick processes.
- Enable samba specific metadata caching:
gluster volume set <volname> cache-samba-metadata on
- By default, some xattrs are cached by gluster like: capability xattrs, ima xattrs ACLs, etc. If there are any other xattrs that are used by the application using the Gluster storage, execute the following command to add these xattrs to the metadata cache list:
gluster volume set <volname> xattr-cache-list "comma separated xattr list"
- Eg:
gluster volume set <volname> xattr-cache-list "user.org.netatalk.*,user.swift.metadata"
Directory operations
Along with enabling the metadata caching, the following options can be set to increase performance of directory operations:
Directory listing Performance:
- Enable
parallel-readdir
gluster volume set <VOLNAME> performance.readdir-ahead on gluster volume set <VOLNAME> performance.parallel-readdir on
File/Directory Create Performance
- Enable
nl-cache
gluster volume set <volname> group nl-cache gluster volume set <volname> nl-cache-positive-entry on
The above command also enables cache invalidation and increases the timeout to 10 minutes
Small file Read operations
For use cases with dominant small file reads, enable the following options
gluster volume set <volname> performance.cache-invalidation on gluster volume set <volname> features.cache-invalidation on gluster volume set <volname> performance.qr-cache-timeout 600 # 10 min recommended setting gluster volume set <volname> cache-invalidation-timeout 600 # 10 min recommended setting
This command enables caching of the content of small file, in the client cache. Enabling cache invalidation ensures cache consistency.
The total cache size can be set using
gluster volume set <volname> cache-size <size>
By default, the files with size <=64KB
are cached. To change this value:
gluster volume set <volname> performance.cache-max-file-size <size>
Note that the size
arguments use SI unit suffixes, e.g. 64KB
or 2MB
.
Others
- When mounting your GlusterFS storage from a remote server to your local server, be sure to dissable direct-io as this will enable the kernel read ahead and file system cache. This will be sensible for most workloads where caching of files is beneficial.
- When mounting the GlusterFS volume over NFS use noatime and nodiratime to remove the timestamps over NFS.
- # gluster volume set $vol performance.o-thread-count 64[4], Today’s CPU are powerful enough to handle 64 threads per volume.
- # gluster volume set $vol client.event-threads XX, XX depend on the number of connections from the FUSE client to the server, you can get this number by running netstat and grep on the server IP and count the number of connections.
- # gluster volume set $vol server.event-threads XX, XX depend on the number of connections from the server to the client(s), you can get this number by running netstat and grep on “gluster" and count the number of connections.
Troubleshooting
Peer Rejected (connected) state in gluster peer status
Hostname: <hostname> Uuid: <xxxx-xxx-xxxx> State: Peer Rejected (Connected)
Solution 1, Fully sync with the rest of the trusted cluster pool[5]
This indicates that the volume configuration on the node is not in sync with the rest of the trusted cluster storage pool.
- first check and compare the peers brick UUID # between master view and /var/lib/glusterd/glusterd.info on rejected peer. if the uuid is different then you need to update master's view uuid on step #2 in following procedure.
- On the rejected peer: (Try the whole procedure a couple more times if it doesn't work right away)
- Stop glusterd
- In /var/lib/glusterd, delete everything except glusterd.info (the UUID file)
- Start glusterd
- Probe one of the good peers
- Restart glusterd, check 'gluster peer status'
- You may need to restart glusterd another time or two, keep checking peer status
Solution 2, Update the cluster.op-version[6]
- Run
gluster volume get all cluster.max-op-version
to get the latest supported op-version. - Update the cluster.op-version to the latest supported op-version by executing
gluster volume set all cluster.op-version <op-version>
.
Solution3, Expecting data loss
If you don't care about the data on the failed peer and just want to resync, the following might work (I take no responsibility for any data loss, etc, etc)[7]
<# Remove dead server gluster volume remove-brick myvolume 10.0.0.5:/brickpath force gluster peer detach 10.0.0.5 # add peer again gluster peer probe 10.0.0.5 gluster volume add-brick myvolume 10.0.0.5:/brickpath force
Reference
- ↑ https://github.com/gluster/glusterfs/issues/833
- ↑ https://www.jamescoyle.net/how-to/559-glusterfs-performance-tuning
- ↑ https://docs.gluster.org/en/main/Administrator-Guide/Performance-Tuning/
- ↑ https://www.spinics.net/lists/gluster-users/msg24680.html
- ↑ https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/?q=remove+blick&check_keywords=yes&area=default
- ↑ https://gitlab.ito.umt.edu/zr139734e/glusterdocs/-/blob/v3/Troubleshooting/troubleshooting-glusterd.md
- ↑ https://github.com/gluster/glusterfs/issues/159