Thursday, 23 July 2015

Hive 0.13: Got Exception while executing a query with UDF due to Kryo Serialization of MapWork

I wrote my custom UDF which internally maintained a Map of List as part of logic. But when i tried to execute it as part of query:
hive> ADD JAR customUDF.jar;
hive> CREATE TEMPORARY FUNCTION customUDF AS 'org.custom.MyUDF';
hive> select x, customUDF(y) from A;

i got following error:
----------------------------------------
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1436789290291_17819, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@6be236d0)
---------------------------------------------------
----------------------------------------------------
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.hive.ql.log.PerfLogger: <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: Deserializing MapWork via kryo
xxxx-xx-xx xx:xx:xx,xxx FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IllegalAccessError: tried to access class sun.nio.cs.UTF_8 from class sun.nio.cs.UTF_8ConstructorAccess
 at sun.nio.cs.UTF_8ConstructorAccess.newInstance(Unknown Source)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo$1.newInstance(Kryo.java:1062)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
 at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:918)
 at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:826)
 at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:840)
 at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:333)
 at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:275)
 at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
 at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
 at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
 at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:172)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1469)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Solution:
set `hive.plan.serialization.format=javaXML`

Reason:
The exception stack trace  is due to error in deserializing an object serialized using kryo library. Kryo is a fast java serialization library. It is very generic, not hinting to us the object whose deserialization is failing.
Hint lies in hive internals and line in logs just before error:
xxxx-xx-xx xx:xx:xx,xxx INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: Deserializing MapWork via kryo

Internally, When we submit a hive query, hive execution engine builds an execution plan detailing the work to be done in mapper and reducer phase of each mapreduce job going to launched. It uses kryo to serialize the execution plan of map and reduce phase. The mappers and reducers pick up their corresponding serialized plan, deserialize and got to know what to execute.

Due to some open issues(https://issues.apache.org/jira/browse/HIVE-7711), kryo might not be able to serialize the custom UDF in the query. Thus, we have to switch to old and trusted java-xml serialization. Obviously, it is not as optimized as kryo but still worthy to use till kryo resolves these issues.

Tuesday, 7 July 2015

Unix/Linux - Must Know Hacks

This post is a reference post for me to look back on important and most used hacks in Unix world. As unix/linux is bread and butter for any developer like me, it could be handy for others also.

For unix hacks which i came across during my work, i am listing the subject and link/links which provide solution to problem. Kudos to all the linux administrators and techies whose contributions have made life easier for other developers.

1. How to automatically start services on boot?
http://www.abhigupta.com/2010/06/how-to-auto-start-services-on-boot-in-centos-redhat/

2. How to unlist a service from automatic start on boot?
http://www.abhigupta.com/2010/06/how-to-auto-start-services-on-boot-in-centos-redhat/

3. How to run a long-running process detached from session(keeping it running even after closing the session) and attached it later to session?
http://www.thegeekstuff.com/2010/07/screen-command-examples/
http://www.tecmint.com/screen-command-examples-to-manage-linux-terminals/

4. Sticky bit in Unix File permission
https://en.wikipedia.org/wiki/Sticky_bit
http://www.thegeekstuff.com/2013/02/sticky-bit/

5. VI/VIM Cheatsheet

move to start of file                                      1G/gg
move to end of file                                       G
move to nth line of file                                 nG (for ex, 100G to move to line no 100)

6. Unix Special Variables

$#                        Stores the number of command-line arguments that
                            were passed to the shell program.
$?                         Stores the exit value of the last command that was
                            executed.
$0                        Stores the first word of the entered command (the
                            name of the shell program).
$*                        Stores all the arguments that were entered on the
                           command line ($1 $2 ...).
"$@"                    Stores all the arguments that were entered
                           on the command line, individually quoted ("$1" "$2" ...).