Wednesday, 9 December 2015

container-launch exception in CDH 5.3 while launching spark-shell or submitting spark job in YARN deploy mode

It is a kind of bug in CDH 5.3 spark configuration but couple of months back while running first spark job, it was a big mysterical hurdle. I am adding a post on it as i came across couple of guys in my workplace facing same issue. I am hoping it might help other guys stepping into spark world through CDH 5.3 .

Below is the sequence of issue occurence and resolution:


  • [cloudera@quickstart bin]$  ./spark-shell --master yarn
2015-10-25 22:42:09,321 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: cloudera
2015-10-25 22:42:09,332 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: cloudera
2015-10-25 22:42:09,333 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
2015-10-25 22:42:09,333 INFO  [main] spark.HttpServer (Logging.scala:logInfo(59)) - Starting HTTP Server
2015-10-25 22:42:09,483 INFO  [main] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2015-10-25 22:42:09,520 INFO  [main] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SocketConnector@0.0.0.0:40247
2015-10-25 22:42:09,521 INFO  [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'HTTP class server' on port 40247.
............................................
2015-10-25 22:42:42,999 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.cloudera
start time: 1445838161149
final status: UNDEFINED
tracking URL: http://quickstart.cloudera:8088/proxy/application_1436476349975_0011/
user: cloudera
2015-10-25 22:42:44,672 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1436476349975_0011 (state: ACCEPTED)
2015-10-25 22:42:45,681 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1436476349975_0011 (state: ACCEPTED)
2015-10-25 22:42:55,364 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1436476349975_0011 (state: ACCEPTED)
2015-10-25 22:43:00,660 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1436476349975_0011 (state: ACCEPTED)
2015-10-25 22:43:01,849 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1436476349975_0011 (state: ACCEPTED)
2015-10-25 22:43:02,854 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1436476349975_0011 (state: FAILED)
2015-10-25 22:43:02,854 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) -
client token: N/A
diagnostics: Application application_1436476349975_0011 failed 2 times due to AM Container for appattempt_1436476349975_0011_000002 exited with  exitCode: 1 due to: Exception from container-launch.
Container id: container_1436476349975_0011_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.cloudera
start time: 1445838161149
final status: FAILED
tracking URL: http://quickstart.cloudera:8088/cluster/app/application_1436476349975_0011
user: cloudera
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:102)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:140)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:335)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:986)
at $iwC$$iwC.<init>(<console>:9)
at $iwC.<init>(<console>:18)
at <init>(<console>:20)
at .<init>(<console>:24)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:270)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:60)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:147)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:60)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:60)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:962)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

  • One thing was obvious after reading the above stacktrace, there is some exception in launching container for spark-shell application, application_1436476349975_0011. The application-id can be retrieved from spark-shell console logs above.
  • The next thing was to check yarn logs for this application.
[cloudera@quickstart bin]$ yarn logs -applicationId application_1436476349975_0011
15/10/25 22:44:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032


Container: container_1436476349975_0011_01_000001 on quickstart.cloudera_60545
================================================================================
LogType: stderr
LogLength: 1151
Log Contents:
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
at org.apache.spark.Logging$class.initializeLogging(Logging.scala:116)
at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:107)
at org.apache.spark.Logging$class.log(Logging.scala:51)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:495)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:511)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:536)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.lang.ClassNotFoundException: org.slf4j.impl.StaticLoggerBinder
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 7 more

LogType: stdout
LogLength: 0
Log Contents:



Container: container_1436476349975_0011_02_000001 on quickstart.cloudera_60545
================================================================================
LogType: stderr
LogLength: 1151
Log Contents:
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
at org.apache.spark.Logging$class.initializeLogging(Logging.scala:116)
at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:107)
at org.apache.spark.Logging$class.log(Logging.scala:51)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:495)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:511)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:536)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.lang.ClassNotFoundException: org.slf4j.impl.StaticLoggerBinder
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 7 more

LogType: stdout
LogLength: 0
Log Contents:


  • On first look, It seems to be a classpath issue due to which it was not able to find slf4j lib in classpath.
  • Resolution lies in configuration file, /etc/spark/conf/spark-env.sh where there is a minor typographical mistake.
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-etc/hadoop/conf}

In the above statement in spark-env.sh, there is a / missing in front of etc/hadoop/conf due to which it is missing slf4j.

so, i changed it to 

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}

  • Now it is running fit and fine.
Happy Hacking !!!

1 comment:

  1. Great post! I am see the programming coding and step by step execute the outputs.I am gather this coding more information. It's helpful for me my friend. Also great blog here with all of the valuable information you have.
    PHP Training in Chennai

    ReplyDelete