Big Data/Analytics Zone is brought to you in partnership with:

Jakub is a Java EE developer since 2005 and occasionally a project manager, working currently with Iterate AS. He's highly interested in developer productivity (and tools like Maven and AOP/AspectJ), web frameworks, Java portals, testing and performance and works a lot with IBM technologies. A native to Czech Republic, he lives now in Oslo, Norway. Jakub is a DZone MVB and is not an employee of DZone and has posted 154 posts at DZone. You can read more from them at their website. View Full User Profile

Enabling JMX Monitoring for Hadoop & Hive

09.25.2012
| 5220 views |
  • submit to reddit

Hadoop’s NameNode and JobTracker expose interesting metrics and statistics over the JMX. Hive seems not to expose anything intersting but it still might be useful to monitor its JVM or do simpler profiling/sampling on it. Let’s see how to enable JMX and how to access it securely, over SSH.

Background: We run NameNode, JobTracker and Hive on the same server. Monitoring og TaskTrackers and DataNodes isn’t that interesting but still might be useful to have.

Configuration

/etc/hadoop/hadoop-env.sh

diff --git a/etc/hadoop/hadoop-env.sh b/etc/hadoop/hadoop-env.sh
index 69a13b1..e8ca596 100644
--- a/etc/hadoop/hadoop-env.sh
+++ b/etc/hadoop/hadoop-env.sh
@@ -14,7 +14,8 @@ export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
 #export HADOOP_NAMENODE_INIT_HEAPSIZE=""

 # Extra Java runtime options. Empty by default.
-export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS"
+# Added $HIVE_OPTS that is set by hive-env.sh when starting hiveserver
+export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS $HIVE_OPTS"

 # Command specific options appended to HADOOP_OPTS when specified
 export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT $HADOOP_NAMENODE_OPTS"
@@ -43,3 +44,16 @@ export HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop

 # A string representing this instance of hadoop. $USER by default.
 export HADOOP_IDENT_STRING=$USER
+
+### JMX settings
+export JMX_OPTS=" -Dcom.sun.management.jmxremote.authenticate=false \
+    -Dcom.sun.management.jmxremote.ssl=false \
+    -Dcom.sun.management.jmxremote.port"
+#    -Dcom.sun.management.jmxremote.password.file=$HADOOP_HOME/conf/jmxremote.password \
+#    -Dcom.sun.management.jmxremote.access.file=$HADOOP_HOME/conf/jmxremote.access"
+export HADOOP_NAMENODE_OPTS="$JMX_OPTS=8006 $HADOOP_NAMENODE_OPTS"
+export HADOOP_SECONDARYNAMENODE_OPTS="$HADOOP_SECONDARYNAMENODE_OPTS"
+export HADOOP_DATANODE_OPTS="$JMX_OPTS=8006 $HADOOP_DATANODE_OPTS"
+export HADOOP_BALANCER_OPTS="$HADOOP_BALANCER_OPTS"
+export HADOOP_JOBTRACKER_OPTS="$JMX_OPTS=8007 $HADOOP_JOBTRACKER_OPTS"
+export HADOOP_TASKTRACKER_OPTS="$JMX_OPTS=8007 $HADOOP_TASKTRACKER_OPTS"

The JMX setting is used for Hadoop’s daemons while the HIVE_OPTS was added for Hive.

<hive home>/conf/hive-env.sh

Enable JMX when running the Hive thrift server (we don’t want it when running the command-line client etc. since it’s pointless and we wouldn’t need to make sure that each of them has a unique port):

if [ "$SERVICE" = "hiveserver" ]; then
  JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"
  export HIVE_OPTS="$HIVE_OPTS $JMX_OPTS"
fi

Pitfalls

When you start Hive server via hive –service hiveserver then it actually executes “hadoop jar …” so to be able to pass options from hive-env.sh to the JVM we had to add $HIVE_OPTS in hadoop-env.sh. (I haven’t found a cleaner way to do it.)

Effects

When we now start Hive or any of the Hadoop daemons, they will expose their metrics at their respective ports (NameNode – 8006, JobTracker – 8007, Hive – 8008).

(If you are running DataNode and/or TaskTracker on the same machine then you’ll need to change their ports to be unique.)

Secure Connection Over SSH

Read the post VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not) to find out how to connect securely to the JMX ports over ssh, f.ex. with VisualVM (spolier: ssh -D 9696 hostname; use proxy at localhost:9696).

 

 

Published at DZone with permission of Jakub Holý, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)