Glusterfs: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
== .gluster folder ==
== .gluster Folder Cleanup ==
.glusterfs creates a hardlink for each of the file present in the brick. The contents of .glusterfs must not be touched unless you know very well what you are doing. Its apparent size is not real. It mostly contains hard-links to other files, so the net space usage contribution for this directory is very small.<ref>https://github.com/gluster/glusterfs/issues/833</ref>  
.glusterfs creates a hardlink for each of the file present in the brick. The contents of .glusterfs must not be touched unless you know very well what you are doing. Its apparent size is not real. It mostly contains hard-links to other files, so the net space usage contribution for this directory is very small.<ref>https://github.com/gluster/glusterfs/issues/833</ref>  


Line 8: Line 8:
# find .glusterfs -type f -links 1      #check if there are any files that have just 1 link
# find .glusterfs -type f -links 1      #check if there are any files that have just 1 link
# find .glusterfs -type f | wc -l
# find .glusterfs -type f | wc -l
</syntaxhighlight>
[https://icicimov.github.io/blog/high-availability/GlusterFS-orphaned-GFID-hard-links/ Igor Cicimov] describes well how to get rid of GlusterFS orphaned GFID hard links <syntaxhighlight lang="bash">
# find /path-to-bric/.glusterfs -type f -links -2 -exec rm -fv {} \;
</syntaxhighlight>
</syntaxhighlight>
 


== GlusterFS Performance Tuning ==
== GlusterFS Performance Tuning ==
Line 136: Line 140:
== Tunning Points<ref>https://docs.gluster.org/en/main/Administrator-Guide/Performance-Tuning/</ref> ==
== Tunning Points<ref>https://docs.gluster.org/en/main/Administrator-Guide/Performance-Tuning/</ref> ==
When mounting your storage for the GlusterFS later, make sure it is configured for the type of [[workload]] you have.
When mounting your storage for the GlusterFS later, make sure it is configured for the type of [[workload]] you have.
=== Disable <code>performance.client-io-threads on distributed volumes</code> ===
Red Hat recommends disabling the <code>performance.client-io-threads</code> option on distributed volumes, as this option tends to worsen performance. Run the following<ref>https://access.redhat.com/documentation/ko-kr/red_hat_gluster_storage/3.4/html/administration_guide/creating_distributed_volumes</ref>
$ sudo gluster volume set ucmgv performance.client-io-threads off


=== Enable Metadata cache ===
=== Enable Metadata cache ===
Line 143: Line 151:
# To increase the number of files that can be cached, execute the following command:   
# To increase the number of files that can be cached, execute the following command:   
## <code>gluster volume set <volname> network.inode-lru-limit <n></code>  n, is set to 50000. It can be increased if the number of active files in the volume is very high. Increasing this number increases the memory footprint of the brick processes.
## <code>gluster volume set <volname> network.inode-lru-limit <n></code>  n, is set to 50000. It can be increased if the number of active files in the volume is very high. Increasing this number increases the memory footprint of the brick processes.
# Enable samba specific metadata caching:   
# Enable [[samba]] specific metadata caching:   
## <code>gluster volume set <volname> cache-samba-metadata on</code>
## <code>gluster volume set <volname> cache-samba-metadata on</code>
# By default, some xattrs are cached by gluster like: capability xattrs, ima xattrs    ACLs, etc. If there are any other xattrs that are used by the application using    the Gluster storage, execute the following command to add these xattrs to the metadata    cache list:     
# By default, some xattrs are cached by gluster like: capability xattrs, ima xattrs    ACLs, etc. If there are any other xattrs that are used by the application using    the Gluster storage, execute the following command to add these xattrs to the metadata    cache list:     
Line 191: Line 199:


== Troubleshooting ==
== Troubleshooting ==
=== Glusterfs log location ===
/var/log/glusterfs


=== Peer Rejected (connected) state in gluster peer status ===
=== Peer Rejected (connected) state in gluster peer status ===

Latest revision as of 11:25, 18 February 2024

.gluster Folder Cleanup

.glusterfs creates a hardlink for each of the file present in the brick. The contents of .glusterfs must not be touched unless you know very well what you are doing. Its apparent size is not real. It mostly contains hard-links to other files, so the net space usage contribution for this directory is very small.[1]

If you use 'du' on the entire volume, you will get the actual used space because 'du' already takes into account hard links.

From the root of the brick, run this:

# du -sh * .glusterfs 
# find .glusterfs -type f -links 1      #check if there are any files that have just 1 link
# find .glusterfs -type f | wc -l

Igor Cicimov describes well how to get rid of GlusterFS orphaned GFID hard links

# find /path-to-bric/.glusterfs -type f -links -2 -exec rm -fv {} \;


GlusterFS Performance Tuning

performance tuning, there are no magic values for these which work on all systems. The defaults in GlusterFS are configured at install time to provide best performance over mixed workloads. To squeeze performance out of GlusterFS, use an understanding of the below parameters and how them may be used in your setup.

After making a change, be sure to restart all GlusterFS processes and begin benchmarking the new values.[2]

General Commands

Check Status

# gluster peer status

# gluster --remote-host=nod1 peer status

# gluster pool list

Prove peer

# gluster peer probe <IP>

Detach peer

# gluster peer detach <node>

# gluster peer detach <node> force

Volume

# gluster volume create <Volume name> transport tcp <brick path>

# gluster volume create ucmvolume nod1:/blok1 nod2:/blok1

# gluster volume list

# gluster volume get <Volume name> all
Option                                  Value                                   
------                                  -----                                   
cluster.lookup-unhashed                 on 
...

# gluster volume reset <Volume name> nfs.disable

# gluster volume set ucmvolume nfs.disable on

# gluster volume status <Volume name>

# gluster volume start <Volume name>

# gluster volume info <Volume name>

Remove bricks

root@nod1 :~# gluster volume remove-brick ucmvolume nod1:/blok1 start 
volume remove-brick start: success
ID: c6ab64f7-d921-4e07-9350-0524b7d2a613
root@nod1 :~# gluster volume remove-brick ucmvolume  nod1:/blok1 status
   Node  Rebalanced-files        size      scanned    failures      skipped       status  run time in secs
  ---------  -----------  -----------  -----------  -----------  -----------  ------------   --------------
   localhost            0      0Bytes            0            0            0    completed             0.00
root@nod1 :~# gluster volume remove-brick ucmvolume nod1:/blok1 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success

# mount -t glusterfs nod1:/ucmvolume /mnt

# mount -t glusterfs nod2:/ucmvolume /mnt -o backupvolfile-server=nod1

# mount -t glusterfs nod2,nod1:/ucmvolume /mnt

# gluster snapshot list# gluster snapshot status

# gluster snapshot info

Snap Name : build1_GMT-2015.05.26-14.19.01
Snap UUID : 3b3b8e45-5c81-45ff-8e82-fea05b1a516a

        Brick Path        :   nod1:/run/gluster/snaps/25fa0c73a6b24327927c0dc9f4f08dba/brick1/data
        Volume Group      :   ThinGroup
        Brick Running     :   No
        Brick PID         :   N/A
        Data Percentage   :   40.80
        LV Size           :   80.00g


        Brick Path        :   nod2:/run/gluster/snaps/25fa0c73a6b24327927c0dc9f4f08dba/brick2/data
        Volume Group      :   ThinGroup
        Brick Running     :   No
        Brick PID         :   N/A
        Data Percentage   :   40.75
        LV Size           :   80.00g


        Brick Path        :   nod3:/run/gluster/snaps/25fa0c73a6b24327927c0dc9f4f08dba/brick3/data
        Volume Group      :   ThinGroup
        Brick Running     :   No
        Brick PID         :   N/A
        Data Percentage   :   40.82
        LV Size           :   80.00g

# gluster snapshot activate build1_GMT-2015.05.26-14.19.01

# gluster snapshot deactivate build1_GMT-2015.05.26-14.19.01

# mount -t glusterfs localhost:/snaps/build1_GMT-2015.05.26-14.19.01/buildinglab /mnt

root@nod1~# gluster snapshot clone build-new build1_GMT-2015.05.26-14.19.01
snapshot clone: success: Clone build-new created successfully
root@nod1~# mount
...
/dev/mapper/ThinGroup-build--new_0 on /srv/buildinglab type btrfs (rw,relatime,nodatasum,nodatacow,space_cache,autodefrag)
/dev/mapper/ThinGroup-build--new_0 on /run/gluster/snaps/25fa0c73a6b24327927c0dc9f4f08dba/brick2 type btrfs (rw,relatime,nodatasum,nodatacow,space_cache,autodefrag)
/dev/mapper/ThinGroup-build--new_0 on /run/gluster/snaps/build-new/brick2 type btrfs (rw,relatime,nodatasum,nodatacow,space_cache,autodefrag)
...


# gluster volume heal <Volume name> enable

# gluster volume heal git info


GlusterFS Configuration

GlusterFS volumes can be configured with multiple settings. These can be set on a volume using the below command substituting [VOLUME] for the volume to alter, [OPTION]  for the parameter name and [PARAMETER] for the parameter value.

gluster volume set [VOLUME] [OPTION] [PARAMETER]

Example:

gluster volume set myvolume performance.cache-size 1GB

Or you can add the parameter to the glusterfs.vol config file.

vi /etc/glusterfs/glusterfs.vol
  • performance.write-behind-window-size – the size in bytes to use for the per file write behind buffer. Default: 1MB.
  • performance.cache-refresh-timeout – the time in seconds a cached data file will be kept until data revalidation occurs. Default: 1 second.
  • performance.cache-size – the size in bytes to use for the read cache. Default: 32MB.
  • cluster.stripe-block-size – the size in bytes of the unit that will be read from or written to on the GlusterFS volume. Smaller values are better for smaller files and larger sizes for larger files. Default: 128KB.
  • performance.io-thread-count – is the maximum number of threads used for IO. Higher numbers improve concurrent IO operations, providing your disks can keep up. Default: 16.

Tunning Points[3]

When mounting your storage for the GlusterFS later, make sure it is configured for the type of workload you have.

Disable performance.client-io-threads on distributed volumes

Red Hat recommends disabling the performance.client-io-threads option on distributed volumes, as this option tends to worsen performance. Run the following[4]

$ sudo gluster volume set ucmgv performance.client-io-threads off

Enable Metadata cache

  1. Enable metadata caching and cache invalidation:
    1. gluster volume set <volname> group metadata-cache This group command enables caching of stat and xattr information of a file or directory. The caching is refreshed every 10 min, and cache-invalidation is enabled to ensure cache consistency.
  2. To increase the number of files that can be cached, execute the following command:
    1. gluster volume set <volname> network.inode-lru-limit <n> n, is set to 50000. It can be increased if the number of active files in the volume is very high. Increasing this number increases the memory footprint of the brick processes.
  3. Enable samba specific metadata caching:
    1. gluster volume set <volname> cache-samba-metadata on
  4. By default, some xattrs are cached by gluster like: capability xattrs, ima xattrs ACLs, etc. If there are any other xattrs that are used by the application using the Gluster storage, execute the following command to add these xattrs to the metadata cache list:
    1. gluster volume set <volname> xattr-cache-list "comma separated xattr list"
    2. Eg: gluster volume set <volname> xattr-cache-list "user.org.netatalk.*,user.swift.metadata"

Directory operations

Along with enabling the metadata caching, the following options can be set to increase performance of directory operations:

Directory listing Performance:

  • Enable parallel-readdir

gluster volume set <VOLNAME> performance.readdir-ahead on gluster volume set <VOLNAME> performance.parallel-readdir on

File/Directory Create Performance

  • Enable nl-cache

gluster volume set <volname> group nl-cache gluster volume set <volname> nl-cache-positive-entry on

The above command also enables cache invalidation and increases the timeout to 10 minutes

Small file Read operations

For use cases with dominant small file reads, enable the following options

gluster volume set <volname> performance.cache-invalidation on
gluster volume set <volname> features.cache-invalidation on
gluster volume set <volname> performance.qr-cache-timeout 600 # 10 min recommended setting
gluster volume set <volname> cache-invalidation-timeout 600 # 10 min recommended setting

This command enables caching of the content of small file, in the client cache. Enabling cache invalidation ensures cache consistency.

The total cache size can be set using

gluster volume set <volname> cache-size <size>

By default, the files with size <=64KB are cached. To change this value:

gluster volume set <volname> performance.cache-max-file-size <size>

Note that the size arguments use SI unit suffixes, e.g. 64KB or 2MB.

Others

  • When mounting your GlusterFS storage from a remote server to your local server, be sure to dissable direct-io as this will enable the kernel read ahead and file system cache. This will be sensible for most workloads where caching of files is beneficial.
  • When mounting the GlusterFS volume over NFS use noatime and nodiratime to remove the timestamps over NFS.
  • # gluster volume set $vol performance.o-thread-count 64[5], Today’s CPU are powerful enough to handle 64 threads per volume.
  • # gluster volume set $vol client.event-threads XX, XX depend on the number of connections from the FUSE client to the server, you can get this number by running netstat and grep on the server IP and count the number of connections.
  • # gluster volume set $vol server.event-threads XX, XX depend on the number of connections from the server to the client(s), you can get this number by running netstat and grep on “gluster" and count the number of connections.

Troubleshooting

Glusterfs log location

/var/log/glusterfs

Peer Rejected (connected) state in gluster peer status

Hostname: <hostname>
Uuid: <xxxx-xxx-xxxx>
State: Peer Rejected (Connected)

Solution 1, Fully sync with the rest of the trusted cluster pool[6]

This indicates that the volume configuration on the node is not in sync with the rest of the trusted cluster storage pool.

  1. first check and compare the peers brick UUID # between master view and /var/lib/glusterd/glusterd.info on rejected peer. if the uuid is different then you need to update master's view uuid on step #2 in following procedure.
  2. On the rejected peer: (Try the whole procedure a couple more times if it doesn't work right away)
    1. Stop glusterd
    2. In /var/lib/glusterd, delete everything except glusterd.info (the UUID file)
    3. Start glusterd
    4. Probe one of the good peers
    5. Restart glusterd, check 'gluster peer status'
    6. You may need to restart glusterd another time or two, keep checking peer status

Solution 2, Update the cluster.op-version[7]

  • Run gluster volume get all cluster.max-op-version to get the latest supported op-version.
  • Update the cluster.op-version to the latest supported op-version by executing gluster volume set all cluster.op-version <op-version>.

Solution3, Expecting data loss

If you don't care about the data on the failed peer and just want to resync, the following might work (I take no responsibility for any data loss, etc, etc)[8]

<# Remove dead server
gluster volume remove-brick myvolume 10.0.0.5:/brickpath force
gluster peer detach 10.0.0.5
# add peer again
gluster peer probe 10.0.0.5
gluster volume add-brick myvolume 10.0.0.5:/brickpath force

Reference