Thursday 23 July 2015

Hive 0.13: Got Exception while executing a query with UDF due to Kryo Serialization of MapWork

I wrote my custom UDF which internally maintained a Map of List as part of logic. But when i tried to execute it as part of query:
hive> ADD JAR customUDF.jar;
hive> CREATE TEMPORARY FUNCTION customUDF AS 'org.custom.MyUDF';
hive> select x, customUDF(y) from A;

i got following error:
----------------------------------------
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1436789290291_17819, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@6be236d0)
---------------------------------------------------
----------------------------------------------------
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.hive.ql.log.PerfLogger: <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: Deserializing MapWork via kryo
xxxx-xx-xx xx:xx:xx,xxx FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IllegalAccessError: tried to access class sun.nio.cs.UTF_8 from class sun.nio.cs.UTF_8ConstructorAccess
 at sun.nio.cs.UTF_8ConstructorAccess.newInstance(Unknown Source)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo$1.newInstance(Kryo.java:1062)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
 at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:918)
 at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:826)
 at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:840)
 at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:333)
 at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:275)
 at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
 at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
 at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
 at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:172)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1469)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Solution:
set `hive.plan.serialization.format=javaXML`

Reason:
The exception stack trace  is due to error in deserializing an object serialized using kryo library. Kryo is a fast java serialization library. It is very generic, not hinting to us the object whose deserialization is failing.
Hint lies in hive internals and line in logs just before error:
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: Deserializing MapWork via kryo

Internally, When we submit a hive query, hive execution engine builds an execution plan detailing the work to be done in mapper and reducer phase of each mapreduce job going to launched. It uses kryo to serialize the execution plan of map and reduce phase. The mappers and reducers pick up their corresponding serialized plan, deserialize and got to know what to execute.

Due to some open issues(https://issues.apache.org/jira/browse/HIVE-7711), kryo might not be able to serialize the custom UDF in the query. Thus, we have to switch to old and trusted java-xml serialization. Obviously, it is not as optimized as kryo but still worthy to use till kryo resolves these issues.

No comments:

Post a Comment