Beware, biting bsize
There are a lot of suggestions to increase the rsize
and the wsize
parameter for NFS to get better performance when doing large transfers. The idea is that NFS is transmitting larger chunks at once and thus improving the performance. However there is a step that you have to do before, when you want to increase both parameters.
The situation
Okay, let’s start with a small size. I prepared a NFS server VM with a single share on 10.10.10.1. So i’m mounting the share with 8k rsize/wsize just for a start:
Now i’m executing a small test via dd in the mounted directory.
Okay. I’m using a small dtrace script in order to find out, what has been used by the client to transport the data to the server by looking at the stuff hitting the server. It’s an really extremely simple dtrace script:
.
Okay, i should explain now, that I’ve started it before running the dd and stopped it afterwards:
Okay, the client transmitted 512 8k chunks to the server. As expected. 512 times 4k is 4 Megabyte. 4 times 1024k is 4 Megabyte. Now we repeat the same test with 32k chunks:
128 chunks with 32k. Again … as expected. Okay … let’s test bigger chunks … 1 megabyte.
WTF? Still 128 chunks with 32k each? The setting had no impact.
Solution
So … what’s the issue? Just because you specified a rsize and a wsize doesn’t automatically imply that those chunk size are used. There are certain defaults in the system that limit the size of the chunks transported by NFS as well.
The parameters are called nfs:nfs3_bsize
and nfs:nfs3_max_transfer_size
. The bsize defaults to 32768. In Solaris 10 max_transfer_size defaults to 32768, it has changed to 1M in Solaris 11. For NFS4 there are separate settings with nfs:nfs4_bsize
and nfs:nfs4_max_transfer_size
So by the default the maximum size of a chunk of data transported by NFS is 32768 … no matter what you use as wsize and rsize. You have to change this defaults. nfs:nfs3_bsize
has to be equal or larger than the maximum rsize
or wsize
you specify on the client. nfs:nfs3_max_transfer_size
has to be equal or larger than nfs:nfs3_bsize
.
One of the reason for this is the amount of data you have to allocate for doing communication. 32768 is compromise of minimising memory allocation and maximising performance for sequential loads. 8k would be better for memory allocation, 1M is better for sequential performance.
You can change the defaults. Easiest way is by editing /etc/system
and rebooting:
Okay, let’s do the test again:
The dtrace script shows a different output now:
As expect you see 4 chunks with a size 1048676 bytes. As configured by the rsize and wsize. By the way: While a rsize/wsize of 1M may give you best single user sequential read/write performance, my experience so far suggests that 128k are a much better choice when several user are using the the mounted directory in parallel because the small size of the chunks of data to transmit allow the system to share the network link much better between all the requests initiated by the users.