Less known Solaris 11.1 features: A user in 1024 groups and a workaround for a 25 year old problem
For a long time the maximum number of groups a user could belong to was 16, albeit there was a way to get 32. In Solaris 11 and recent versions of Solaris 10, the maximum number of groups a user could belong to is 1024 (which is the same limit Windows sets in this regard). It’s easy to set the new limit.
After a reboot, this change will be active. But why isn’t this the default? There are good reasons for it. I will show you one of them in this entry. Like thinking that two digits for the year or using a signed 32-bit integer for storing the system time, the issue has it’s root cause in a decision made a long time ago … in this example the moment in the past is at least 25 years ago. And often just changing something, breaks stuff that is really old, but still in use.
Experienced Solaris users, who tuned their Solaris System for up to 32 groups per user, already know the component that will be broken by having more than 16 users, because a message at the next boot of the system after the change in
/etc/system that next startup will deliver a warning. It’s about the problem that many people encounter when using NFS with one user in more than 16 groups (it’s not an NFS problem, but i will explain that).
However, as i already said, there is a a solution for this problem since Solaris 11.1. This blog entry will show the workaround in action.
What is the problem? The problem is NFS, or to be exact a mechanisms used by NFS. So it isn’t a problem of NFS, it’s a problem when using NFS in conjunction with the AUTH_SYS in the ONC RPC specification, as NFS depends on the mechanisms provided by RPC for user authentication and user identification.
The security mechanism
AUTH_SYS (or as it is called sometimes - AUTH_UNIX ) doesn’t accept more than 16 groups for a user. It just ignores more than that by not transmitting them. The protocol cannot pass more than 16 group ids as groups to the server due to it’s specification:
The RFC 1831 specifies in appendix A:
The problem is the bold part. The 16 group limit with
AUTH_SYS originates from that part.
That said: The same specification definition is already in RFC1057 from 1988. Windows 2.1x was introduced that year. Linus would release his kernel three years later. It’s a 25 year old specification. Perhaps another example why you should never assume your stuff won’t be used in 25 years. Perhaps people have to develop workarounds for your stuff in 25 years
A short rant and an announcement
That said, i have a lot of mental problems with
AUTH_SYS as a security mechanism for NFS. I don’t want to make the case against
AUTH_SYS in my blog entry. Many have written before about the intrinsic security issues of this mechanism introduced a long time ago based on assumptions that are not longer valid in many cases. At least the usage of AUTH_SYS needs a lot of thought to protect your data. However it’s the reality that many installations are still using this really basic mechanism.
That said one of the next tutorials will be a tutorial about setting up the alternatives that were developed since to make user authentication and identification more secure.
The problem is easy to show: At first you have to start up a Solaris 11.1 client, after that you have to fire up a Solaris 10 VM and a Solaris 11.1 VM for use as a fileserver. 192.168.1.147 is the client named
client, 192.168.1.149 is the Solaris 11 system named
s11, 192.168.1.150 is the Solaris 10 system named
Then you have to add the following line to
/etc/system of all systems used in the test:
Now you had to reboot the systems. Create a user. In my example i’ve used the username
jmoekamp.Now you add a number of groups to your systems by adding the following snippet to all
/etc/group-files of the three systems.
As you want to demonstrate something with NFS, setting up an NFS share doesn’t harm. At first on the Solaris 11.1 system.
Then you repeat the same steps on the Solaris 10 system.
Now you mount both filesystems on the client:
Situation with a Solaris 10 NFS server
Okay, now let’s check. By using the
/s10 directory we are using the Solaris 10 based System.
As you see, the file is owned by group
120 and it’s correctly translated to the name
g21. The user
jmoekamp is member of group
g21. So you should be able to access it. However when you try it, the outcome is a little bit different.
The system denies you access to the file. Okay, let’s try it with a different group. I log into the fileserver and change the group of the file:
I try it again. And now the system give me access to the file.
However the user is definitely in both groups:
So why does one requests yield only a permission denied, and the other gives you access to the file. Despite the fact that the user is in both groups. Despite the fact that the
/etc/group and the
/etc/passwd are absolutely identical.
Simply said, the problem is, that the NFS server doesn’t use its own
/etc/group to allow or to deny access. It uses data in the
AUTH_SYS data structure i’ve shown to you earlier. Let’s look into it by using
snoop -v -d net0 host 192.168.1.147 and host 192.168.1.150:
As you have surely recognised, there isn’t the gid of
g21 (120) in the credentials, however the gid of
g5 (104) is in structure. So for the NFS server, the user
is simply not in the group
g21 and so it denies access.
Of course this behaviour is broken, however it obeys the standard. This is the basic reason why Solaris 10 prints when starting up.
Situation with a Solaris 11.1 NFS server
The situation is different since Solaris 11.1. because it introduced a mechanism to work with NFS and AUTH_SYS with more than 16 groups per user, without breaking the standard.
You can access the file. Perhaps it was just luck. Let’s try another group.
With Solaris 11.1 your user can be in more than 16 group and
AUTH_SYS still works the issue shown with Solaris 10.
It’s not the way that the protocol has been extended to carry more groups, as this could break compatibility and would not work with other NFS clients unaware of such extensions. When you look at the output of tcpdump, it’s pretty much the same than with Solaris 10.
So, why does it work with Solaris 11.1? The answer is in the message that appears when you start a Solaris 11.1 system with ngroups_max larger than 16:
That’s different from the output in Solaris 10.
The “trick” is pretty much straightforward: As long you don’t have touched
ngroups_max or when the user credential of
AUTH_SYS contain less than 16 groups, nothing changes. The NFS server will use the group information in the user credentials delivered by
AUTH_SYS. However if
ngroups_max is equal or larger than 16 and there are exactly 16 groups in the credentials transmitted by
AUTH_SYS, the server will resolve the username to a username and look this up on it’s own from the configure name services. Obviously you need user and group information equal on all hosts, however then using AUTH_SYS it should be that way anyways, as the user with a certain user id should be the same both systems. So using NIS or LDAP is a really good idea for such an environment.
So when you have users belonging to more than 16 groups, still want or have to use AUTH_SYS, Solaris 11.1 gives you the necessary mechanisms to do so.
Do you want to learn more?
docs.oracle.com - ngroups_max