NFS

NFS is the most widely used HPC filesystem. It is very easy to set up and performs reasonably well for small to medium clusters as primary storage. You can even use it for larger clusters. One of the most common questions about NFS configuration is how to tune it for performance and management and what options are typically used. So it is important to know them NFS export and mount options especially when you are facing a performance issue or a functional issue with the NFS mount over network.

Basic command


Commands	Description	Command on
# exportfs -r	Re-export your shares	Server
# exportfs -a	Export your shares	Server
# exportfs -v	Verify the NFS Share permissions	Server
$nfsstat -m	Verify Current NFS Mount Options	Client

NFS export on Server^[1]

NFS exports options are the permissions that is applied on NFS Server when we create a NFS Share under /etc/exports

Here are most common(important) options that administrator must understand, full list of options are available on the man pages


Export Options	NFS Server	Default
`secure/insecure`	NFSv4 only uses port 2049 while to check the list of ports used by NFSv3 use in server port With `secure` the port number from which the client requests a mount must be lower than 1024. To allow client any available free port use `insecure` in the NFS share	`secure`
rw/ro	ro means read-only access to the NFS Share rw means read write access to the NFS Share	rw
root_squash/no_root_squash	squash literally means to squash (destroy) the power of the remote root user. `root_squash` prevents remote root users from having superuser (root) privileges on remote NFS-mounted volumes. `no_root_squash` allows root user on the NFS client host to access the NFS-mounted directory with the same rights and privileges that the superuser would normally have	root_squash
all_quash/no_all_squash	`all_squash` will map all User IDs (UIDs) and group IDs (GIDs) to the anonymous user. `all_squash` is useful for NFS-exported public FTP directories, news spool directories	`no_all_squash`
sync/aysnc	`sync` reply to requests are done only after the changes have been committed to stable storage `async` allows the NFS server to violate the NFS protocol and reply to requests before any changes made by that request have been committed to stable storage Using `aysnc` option usually improves performance, but at the cost that an unclean server restart (i.e. a crash) can cause data to be lost or corrupted	sync

Check exports list and options

#with following set
$ cat /etc/exports 
/nas * (rw,sync,no_subtree_check)

$ rpcinfo -p | grep -i nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100003    3   udp   2049  nfs
    
# detailed export status with default export options
$ sudo exportfs -v
/nas   <world>(ro,wdelay,root_squash,no_subtree_check,sec=sys,ro,secure,root_squash,no_all_squash)

NFS mount on Client^[2]

# mount -t nfs -o [options] remote:/nfs /mount


Mount Options	NFS Client	Default	Notes
nfsvers=n or vers=n	The version of the NFS protocol to use. By default, the local NFS client will attempt to mount the file system using NFS version 3. If the NFS server does not support version 3, the file system will be mounted using version 2. If you know that the NFS server does not support version 3, specify vers=2, and you will save time during the mount, because the client will not attempt to use version 3 before using version 2	3
rw (read/write) / ro (read-only)	Use rw for data that users need to modify. In order for you to mount a directory read/write, the NFS server must export it read/write. Use ro for data you do not want users to change. A directory that is automounted from several servers should be read-only, to keep versions identical on all servers.	rw.
suid / nosuid	Specify suid if you want to allow mounted programs that have setuid permission to run with the permissions of their owners, regardless of who starts them. If a program with setuid permission is owned by root, it will run with root permissions, regardless of who starts it. Specify nosuid to protect your system against setuid programs that may run as root and damage your system.	suid
hard / soft	Specify hard if users will be writing to the mounted directory or running programs located in it. When NFS tries to access a hard-mounted directory, it keeps trying until it succeeds or someone interrupts its attempts. If the server goes down, any processes using the mounted directory hang until the server comes back up and then continue processing without errors. Interruptible hard mounts may be interrupted with CTRL-C or kill (see the intr option, later). Specify soft if the server is unreliable and you want to prevent systems from hanging when the server is down. When NFS tries to access a soft-mounted directory, it gives up and returns an error message after trying retrans times (see the retrans option, later). Any processes using the mounted directory will return errors if the server goes down.	hard
`nconnect=<value>`	NFS-over-TCP mount for one or more NFS shares from an individual NFS server, the traditional behavior is that all those mounts share one TCP connection if they are using the same NFS protocol version. In cases of high NFS work load at the client, this connection sharing may result in lower performance or unnecessary bottlenecks.^[3]	16	In Linux kernel 5.3 (and higher), the `nconnect` option allows multiple TCP connections for a single NFS mount.
intr / nointr	Specify intr if users are not likely to damage critical data by manually interrupting an NFS request. If a hard mount is interruptible, a user may press [CTRL]-C or issue the kill command to interrupt an NFS mount that is hanging indefinitely because a server is down. Specify nointr if users might damage critical data by manually interrupting an NFS request, and you would rather have the system hang while the server is down than risk losing data between the client and the server.	intr	In Linux kernel 2.6.25 (and higher), the `intr` and `nointr` mount options are deprecated. If you use the `hard` option on modern Linux kernels, you must use the `kill -9` `(SIGKILL)` command to interrupt a stuck NFS mount.
fg (foreground) / bg (background)	Specify fg for directories that are necessary for the client machine to boot or operate correctly. If a foreground mount fails, it is retried again in the foreground until it succeeds or is interrupted. All automounted directories are mounted in the foreground; you cannot specify the bg option with automounted directories. Specify bg for mounting directories that are not necessary for the client to boot or operate correctly. Background mounts that fail are re-tried in the background, allowing the mount process to consider the mount complete and go on to the next one. If you have two machines configured to mount directories from each other, configure the mounts on one of the machines as background mounts. That way, if both systems try to boot at once, they will not become deadlocked, each waiting to mount directories from the other. The bg option cannot be used with automounted directories.	fg
devs / nodevs	Specify devs if you are mounting device files from a server whose device files will work correctly on the client. The devs option allows you to use NFS-mounted device files to read and write to devices from the NFS client. It is useful for maintaining a standard, centralized set of device files, if all your systems are configured similarly. Specify nodevs if device files mounted from a server will not work correctly for reading and writing to devices on the NFS client. The nodevs option generates an error if a process on the NFS client tries to read or write to an NFS-mounted device file.	devs
timeo=n	The timeo (timeout) option is the amount of time the NFS client waits on the NFS server before retransmitting a packet (no ACK received). The value for timeo is given in tenths of a second, so if timeo is 5, the NFS client will wait 0.5 seconds before retransmitting The timeout, in tenths of a second, for NFS requests (read and write requests to mounted directories). If an NFS request times out, this timeout value is doubled, and the request is retransmitted. After the NFS request has been retransmitted the number of times specified by the retrans option, a soft mount returns an error, and a hard mount retries the request. The maximum timeo value is 30 (3 seconds). – Try doubling the timeo value if you see several servers not responding messages within a few minutes. This can happen because you are mounting directories across a gateway, because your server is slow, or because your network is busy with heavy traffic.	The default is 0.7 (0.07 seconds)
retrans=n	retrans, specifies the number of tries the NFS client will make to retransmit the packet. If the value is 5, the client resends the RPC packet five times, waiting timeo seconds between tries. If, after the last attempt, the NFS server does not respond, you get a message Server not responding. The number of times an NFS request (a read or write request to a mounted directory) is retransmitted after it times out. If the request does not succeed after n retransmissions, a soft mount returns an error, and a hard mount retries the request. Increase the retrans value for a directory that is soft-mounted from a server that has frequent, short periods of downtime. This gives the server sufficient time to recover, so the soft mount does not return an error	4
retry=n	The number of times the NFS client attempts to mount a directory after the first attempt fails. If you specify intr, you can interrupt the mount before n retries. However, if you specify nointr, you must wait until n retries have been made, until the mount succeeds, or until you reboot the system. If mounts are failing because your server is very busy, increasing the retry value may fix the problem	1
rsize=n	The number of bytes the NFS client requests from the NFS server in a single read request. If packets are being dropped between the client and the server, decrease rsize to 4096 or 2048. To find out whether packets are being dropped, issue the “nfsstat -rc” command at the HP-UX prompt. If the timeout and retrans values returned by this command are high, but the badxid number is close to zero, then packets are being dropped somewhere in the network.	8192
wsize=n	The number of bytes the NFS client sends to the NFS server in a single write request. If packets are being dropped between the client and the server, decrease wsize to 4096 or 2048. To find out whether packets are being dropped, issue the “nfsstat -rc” command at the HP-UX prompt. If the timeout and retrans values returned by this command are high, but the badxid number is close to zero, then packets are being dropped somewhere in the network.	8192
O (Overlay mount)	Allows the file system to be mounted over an existing mount point, making the underlying file system inaccessible. If you attempt to mount a file system over an existing mount point without the -O option, the mount will fail with the error device busy. Caution: Using the -O mount option can put your system in a confusing state. The -O option allows you to hide local data under an NFS mount point without receiving any warning. Local data hidden beneath an NFS mount point will not be backed up during regular system backups. On HP-UX, the -O option is valid only for NFS-mounted file systems. For this reason, if you specify the -O option, you must also specify the -F nfs option to the mount command or the nfs file system type in the /etc/fstab file.	The default value is not specified
remount	If the file system is mounted read-only, this option remounts it read/write. This allows you to change the access permissions from read-only to read/write without forcing everyone to leave the mounted directory or killing all processes using it	The Default value is not specified
noac	If specified, this option prevents the NFS client from caching attributes for the mounted directory. Specify noac for a directory that will be used frequently by many NFS clients. The noac option ensures that the file and directory attributes on the server are up to date, because no changes are cached on the clients. However, if many NFS clients using the same NFS server all disable attribute caching, the server may become overloaded with attribute requests and updates. You can also use the actimeo option to set all the caching timeouts to a small number of seconds, like 1 or 3. If you specify noac, do not specify the other caching options.	The Default value is not specified
nocto	If specified, this option suppresses fresh attributes when opening a file. Specify nocto for a file or directory that never changes, to decrease the load on your network	The Default value is not specified
acdirmax=n	The maximum number of seconds a directory’s attributes are cached on the NFS client. When this timeout period expires, the client flushes its attribute cache, and if the attributes have changed, the client sends them to the NFS server. For a directory that rarely changes or that is owned and modified by only one user, like a user’s home directory, you can decrease the load on your network by setting acdirmax=120 or higher	60
acdirmin=n	The minimum number of seconds a directory’s attributes are cached on the NFS client. If the directory is modified before this timeout expires, the timeout period is extended by acdirmin seconds. For a directory that rarely changes or that is owned and modified by only one user, like a user’s home directory, you can decrease the load on your network by setting acdirmin=60 or higher	30
acregmax=n	The maximum number of seconds a file’s attributes are cached on the NFS client. When this timeout period expires, the client flushes its attribute cache, and if the attributes have changed, the client sends them to the NFS server. For a file that rarely changes or that is owned and modified by only one user, like a file in a user’s home directory, you can decrease the load on your network by setting acregmax=120 or higher	60
actimeo=n	Setting actimeo to n seconds is equivalent to setting acdirmax, acdirmin, acregmax, and acregmin to n seconds. Set actimeo=1 or actimeo=3 for a directory that is used and modified frequently by many NFS clients. This ensures that the file and directory attributes are kept reasonably up to date, even if they are changed frequently from various client locations. Set actimeo=120 or higher for a directory that rarely or never changes. If you set the actimeo value, do not set the acdirmax, acdirmin, acregmax, or acregmin values	The Default value is not specified
grpid	Forces a newly created file in the mounted file system to inherit the group ID of the parent directory. By default, a newly created file inherits the effective group ID of the calling process, unless the GID bit is set on the parent directory. If the GID bit is set, the new file inherits the group ID of the parent directory	The Default value is not specified
lock / nolock	Selects whether to use the NLM sideband protocol to lock files on the server. If neither option is specified (or if lock is specified), NLM locking is used for this mount point. When using the nolock option, applications can lock files, but such locks provide exclusion only against other applications running on the same client. Remote applications are not affected by these locks. NLM locking must be disabled with the nolock option when using NFS to mount /var because /var contains files used by the NLM implementation on Linux. Using the nolock option is also required when mounting exports on NFS servers that do not support the NLM protocol.
local_lock=mechanism	Specifies whether to use local locking for any or both of the flock and the POSIX locking mechanisms. mechanism can be one of all, flock, posix, or none. The Linux NFS client provides a way to make locks local. This means, the applications can lock files, but such locks provide exclusion only against other applications running on the same client. Remote applications are not affected by these locks. If all is specified, the client assumes that both flock and POSIX locks are local. If flock is specified, the client assumes that only flock locks are local and uses NLM sideband protocol to lock files when POSIX locks are used. If posix is specified, the client assumes that POSIX locks are local and uses NLM sideband protocol to lock files when flock locks are used. To support legacy flock behavior similar to that of NFS clients < 2.6.12, use Samba as Samba maps Windows share mode locks as flock. Since NFS clients > 2.6.12 implement flock by emulating POSIX locks, this will result in conflicting locks.	none	This option is supported in kernels 2.6.37 and later. NOTE: When used together, the 'local_lock' mount option will be overridden by 'nolock'/'lock' mount option.

Optimizing NFS Performance^[4]

Tuning for performance is a loaded question because performance is defined by so many different variables, the most important of which is how to measure performance.


Tunning Options	Target	Description	Notes	Recommendation
NFS performance	Synchronous vs asynchronous	See above	to take effect remount exist mout point	sync for data integrity
	Number of NFS daemons (nfsd)	One way to determine whether more NFS threads helps performance is to check the data in cat /proc/net/rpc/nfsd (Ubuntu) for the load on the NFS daemons. The output line that starts with th lists the number of threads, and the last 10 numbers are a histogram of the number of seconds the first 10% of threads were busy, the second 10%, and so on. this page explain how to part the contents of /proc/net/rpc/nfsd. Ideally, you want the last two numbers to be zero or close to zero, indicating that the threads are busy and you are not "wasting" any threads. If the last two numbers are fairly high, you should add NFS daemons, because the NFS server has become the bottleneck. If the last two, three, or four numbers are zero, then some threads are probably not being used^[5] on Ubuntu, RPCNFSDCOUNT in /etc/default/nfs-kernel-server file tells you the number of NFS daemons for the server in addition, for tuning how many threads are needed you could look at : /proc/fs/nfsd/pool_stats	to take effect reboot system^[6]	256 on 16C/128GB 64 on 8C/64GB
	Block Size Setting	Two NFS client options specify the size of data chunks for writing (wsize) and reading (rsize). If you don't specify the chunk sizes, the defaults are determined by the versions of NFS and the kernel being used the best way to check the current chunk size is to run the command on the NFS client and look for the wsize and rsize values. $cat /proc/mounts
	Timeout and Retransmission	On congested networks, you often see retransmissions of RPC packets. A good way to tell is to run the nfsstat -r command and look for the column labeled retrans. If the number is large, the network is likely very congested. If that is the case, you might want to increase the values of timeo and retrans to increase the number of tries and the amount of time between RPC tries. Although taking this action will slow down NFS performance, it might help even out the network traffic so that congestion is reduced. In my experience, getting rid of congestion and dropped packets can result in better, more even performance
	FS-Cache	FS-Cache option enable caches NFS client requests on a local storage device, such as a hard drive or SSD, helping improve NFS read I/O: Data that resides on the local NFS client means the NFS server does not have to be contacted. To use NFS caching you have to enable it explicitly by adding the option -o fsc to the mount command or in /etc/fstab: # mount <nfs-share>:/ </mount/point> -o fsc Any data access to </mount/point> will go through the NFS cache unless the file is opened for direct I/O or if a write I/O is performed. The important thing to remember is that FS-Cache only works if the I/O is a read. FS-Cache can't help with a direct I/O (read or write) or an I/O write request. However, there are plenty of cases in which FS-Cache can help. For example, if you have an application that needs to read from a database or file and you are running a large number of copies of the same application, FS-Cache might help, because each node could cache the database or file.
	Filesystem-independent mount options	Linux mount command has a number of options that are independent of the filesystem and might be able to improve performance noatime – Inode access times are not updated on the filesystem. This can help performance because the access time of the file is not updated every time a file is accessed. nodiratime – The directory inode is not updated on the filesystem when it is accessed. This can help performance in the same way as not updating the file access time. relatime – Inode access times are relative to the modify or change time for the file, so the access time is updated only if the previous atime (access time) was earlier than the modify or change time.
System tuning	System Memory	If you are choosing to use asynchronous NFS mode, you will need more memory to take advantage of async, because the NFS server will first store the I/O request in memory, respond to the NFS client, and then retire the I/O by having the filesystem write it to stable storage. Therefore, you need as much memory as possible to get the best performance.
	MTU	Changing the network MTU (maximum transmission unit) is also a good way to affect performance, but it is not an NFS tunable. The MTU size can be very important because it determines packet fragments on the network. If your chunk size is 8KB and the MTU is 1500, it will take six Ethernet frames to transmit the 8KB. If you increase the MTU to 9000 (9,000 bytes), the number of Ethernet frames drops to one. A study by Dell a few years back examined the effect of an MTU of 1500 compared with an MTU of 9000. Using Netperf, they found that the bandwidth increased by about 33% when an MTU of 9000 was used Fortunatly most switches can accommodate an MTU of 9000 (commonly called "jumbo packets")
	TCP tuning on the server	The NFS server NFS daemons share the same socket input and output queues, so if the queues are larger, all of the NFS daemons have more buffer and can send and receive data much faster. the two values to increase queues for input and output are defined the value in /proc/sys/net/core/rmem_default (the default size of the read queue in bytes) /proc/sys/net/core/rmem_max (the maximum size of the read queue in bytes) To make the values survive reboots, you need to enter them in the proper form in the /etc/sysctl.conf file or /etc/sysctl.d/somefile
NFS management/policy	Subtree checking	subtree_check to the exports on the NFS server checks that the file being accessed is contained within the exported directory and force the NFS server to check that the requested file was located within exported folder Many people have the opinion that subtree_check can have a big effect on performance, but the final determination is up to that is performance more important than security for the configuration and your situation For security, it is recommended to export folders which uses separate partition or separate drive to prevent a rogue user from guessing a file handle to anything outside of the filesystem
NFS management/policy	Root squashing	if you want root to have access to an NFS-mounted filesystem, you can add the option no_root_squash to the file /etc/exports to allow root access. Just be aware that if someone reboots your system to gain root access, it's possible for them to copy (steal) data.

Setting Block Size to Optimize Transfer Speeds

The mount command options rsize and wsize specify the size of the chunks of data that the client and server pass back and forth to each other.

Mount Options Example^[7]

In Linux kernel 5.3 (and higher), the nconnect option allows multiple TCP connections for a single NFS mount. Note: Currently, the maximum number of concurrent TCP connections is 16.
In Linux kernel 2.6.25 (and higher), the intr and nointr mount options are deprecated. If you use the hard option on modern Linux kernels, you must use the kill -9 (SIGKILL) command to interrupt a stuck NFS mount.

Condition

Recommended Options

Client with Server-Side Network Lock Manager (NLM) Enabled

mount -t nfs -o rsize=65536,wsize=65536,intr,hard,tcp,rdirplus,readahead=128 server:/path mountpath

Client with Local Locking Enforced

mount -t nfs   -o rsize=65536,wsize=65536,intr,hard,tcp,locallocks,rdirplus,readahead=128 \ server:/path mountpath

References

[1] ttps://www.golinuxcloud.com/unix-linux-nfs-mount-options-example/

[2] ttps://www.thegeekdiary.com/common-nfs-mount-options-in-linux/

[3] ttps://www.suse.com/support/kb/doc/?id=000019933

[4] ttps://nfs.sourceforge.net/nfs-howto/ar01s05.html

[5] ttps://www.admin-magazine.com/HPC/Articles/Useful-NFS-Options-for-Tuning-and-Management

[6] ttps://ubuntuforums.org/showthread.php?t=2345636

[7] ttps://care.qumulo.com/hc/en-us/articles/115008111268-Recommended-NFS-Mount-Options#recommended-mount-options-for-linux-0-5

[1]

[2]

[3]

[4]

[5]

[6]

[7]

NFS

Contents

Basic command

NFS export on Server^[1]

Check exports list and options

NFS mount on Client^[2]

Optimizing NFS Performance^[4]

Setting Block Size to Optimize Transfer Speeds

Mount Options Example^[7]

References

Navigation menu

NFS

Basic command

NFS export on Server[1]

Check exports list and options

NFS mount on Client[2]

Optimizing NFS Performance[4]

Setting Block Size to Optimize Transfer Speeds

Mount Options Example[7]

References

Navigation menu

Search

NFS export on Server^[1]

NFS mount on Client^[2]

Optimizing NFS Performance^[4]

Mount Options Example^[7]