A user having surprising troubles running more resource-intensive Hive queries

The problem

A couple of months ago, one of our data analysts pernamently run into troubles when he wanted to run more resource-intensive Hive queries. Surprisingly, his queries were valid, syntactically-correct and run successfully on small data, but they just failed on larger datasets. On the other hand, other users were able to run the same queries successfully on the same large datasets. Obviously, it sounds like some permissions problem, however the user had right HDFS and Hive permissions.

The observations

We observed that when our user run the more resource-intensive Hive query (that spawns a lot of map tasks), the Hadoop cluster (especially HDFS daemons) experienced stability problems i.e. NameNode became less responsive and freezed, causing tens of DataNodes to lose connectivity and became marked “dead” (even though the DataNode daemons were still running on these servers).

The logs for NameNode showed a lot of warnings and exceptions thrown in the method org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(final String user). Just to give some numbers – 14,592 warnings/exceptions were logged only during 8 min (4,768/min in a peak).

The method ShellBasedUnixGroupsMapping.getUnixGroups(final String user) is responsible for retrieving the list of groups that the given user belongs by running some Unix command on NameNode server.

package org.apache.hadoop.security;
...
import org.apache.hadoop.util.Shell;
import org.apache.hadoop.util.Shell.ExitCodeException;
 
/**
 * A simple shell-based implementation of {@link GroupMappingServiceProvider} 
 * that exec's the <code>groups</code> shell command to fetch the group
 * memberships of a given user.
 */
public class ShellBasedUnixGroupsMapping implements GroupMappingServiceProvider {
  ...
  /** 
   * Get the current user's group list from Unix by running the command 'groups'
   * NOTE. For non-existing user it will return EMPTY list
   * @param user user name
   * @return the groups list that the <code>user</code> belongs to
   * @throws IOException if encounter any error when running the command
   */
  private static List<String> getUnixGroups(final String user) throws IOException {
    String result = "";
    try {
      result = Shell.execCommand(Shell.getGroupsForUserCommand(user));
    } catch (ExitCodeException e) {
      // if we didn't get the group - just return empty list;
      LOG.warn("got exception trying to get groups for user " + user, e);
    }
 
    StringTokenizer tokenizer = new StringTokenizer(result);
    List<String> groups = new LinkedList<String>();
    while (tokenizer.hasMoreTokens()) {
      groups.add(tokenizer.nextToken());
    }
    return groups;
  }
}

This Unix command to find the user-group mapping is simply id -Gn username, according to the code snipped bellow:

package org.apache.hadoop.util;
...
abstract public class Shell {
  /** a Unix command to get a given user's groups list */
  public static String[] getGroupsForUserCommand(final String user) {
    //'groups username' command return is non-consistent across different unixes
    return new String [] {"bash", "-c", "id -Gn " + user};
  }
  ...
}

Security in Apache Hadoop

Normally (with the default settings), Apache Hadoop is a very trusty elephant. The username of user, who is submitting the job, is just taken from a client machine (and not verified at all, so one user can easily impersonate another e.g. by typing sudo -u other-user command, or creating a user account for “other-user” and accessing HDFS on behalf of this user), while the groupnames are resolved on NameNode server using just the Unix command id -Gn. If a user does not have an account on NameNode server, then the ExitCodeException is thrown by and caught in ShellBasedUnixGroupsMapping.getUnixGroups method. If your job is large (like the Hive query submitted by our user that consists of thousands of map tasks), then you will have many ExitCodeExceptions and the NameNode will stop responding and DataNodes will lose connectivity. If it takes more than 2 * heartbeat.recheck.interval + 10 * heartbeat.interval milliseconds (by default 10min:30sec), then DataNodes may become marked “dead”, when NameNode wakes up.

Possible fixes

How this problem could be solved? Obviously, a quick and dirty solution is to create an account on NameNode server for each user who accessing HDFS (directly, or by submitting MapReduce jobs to the cluster). However, for many reasons, you do not want to give everybody an account on NameNode server.

User-group resolution with AD/LDAP

Instead AD or LDAP could be used for resolving the group membership of users who access HDFS. Hadoop provides a couple of configuration settings hadoop.security.group.mapping.ldap.* (you can find them core-default.xml).

Alternatively, nss_ldap (it allows LDAP directory servers to be used as a primary source of name service information including e.g. users, hosts, groups) can be tried. In this case setting configuration options hadoop.security.group.mapping.ldap.* is not necessary.

We actually solved this issue by using nss_ldap, because the LDAP configuration settings hadoop.security.group.mapping.ldap.* did now work correctly in our case. The values that we wanted to use are as follows:

hadoop.security.group.mapping.ldap.search.filter.group (objectClass=posixGroup)
hadoop.security.group.mapping.ldap.search.attr.member memberUid
hadoop.security.group.mapping.ldap.search.attr.group.name cn

The problematic one is hadoop.security.group.mapping.ldap.search.filter.group, that, according to the documentation, currently does not support posixGroups (what we aimed to use) as a supported group class.

Strong authentication with Kerberos

One can go even one step further and use Kerberos. Although Kerberos is usually configured to take advantage of AD/LDAP servers (so the way how user-group mapping is resolved does not change), it will also provide a full-authentication of users accessing the cluster (so that user identity will be verified and nobody can easily impersonate other user).

Just one thing to note – installing and configuring Kerberos involves many tedious and difficult steps (some of them can be automated by Cloudera Manager). Basically, it is not just changing the configuration property hadoop.security.authentication from simple to kerberos. Before you make a decision to use Kerberos, make sure whether you really need it.

A user having surprising troubles running more resource-intensive Hive queries

The problem

The observations

Security in Apache Hadoop

Possible fixes

User-group resolution with AD/LDAP

Strong authentication with Kerberos

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112