Hadoop Mania: TRANSPOSE/PIVOT a Table in Hive

Tuesday, 8 December 2015

TRANSPOSE/PIVOT a Table in Hive

Transposing/pivoting a table means to convert values of one of the column as set of new columns and another column as corresponding values to those new set of columns.
For example, if original table, "temp_sample" is:

id code key value
-------------------------------
1 A p e
2 B q f
3 B p f
3 B q h
3 B r j
3 C t k

the transpose/pivot of the table should be:

id code p q r t
--------------------------------------
1 A e
2 B f
3 B f h j
3 C k

Hive Implementation

1) When value is of string type
If "test_sample" is hive table with following table definiton:
create table test_sample(id string, code string, key string, value string) row format delimited fields terminated by ',' lines terminated by '\n';

hive> set hive.cli.print.header=true;
hive> select * from test_sample;
id code key value
1 A p e
2 B q f
3 B p f
3 B q h
3 B r j
3 C t k

the query to create transpose of it is:

select b.id, b.code, concat_ws('',b.p) as p, concat_ws('',b.q) as q, concat_ws('',b.r) as r, concat_ws('',b.t) as t from
(select id, code,
collect_list(a.group_map['p']) as p,
collect_list(a.group_map['q']) as q,
collect_list(a.group_map['r']) as r,
collect_list(a.group_map['t']) as t
from ( select
id, code,
map(key,value) as group_map
from test_sample
) a group by a.id, a.code) b;

On execution of this query, the output will be:
id code p q r t
--------------------------------------
1 A e
2 B f
3 B f h j
3 C k

which is the expected output.

Important, point to note is that it is not using any custom UDF/UDAFs. It is only using in-built hive functions which save us lot of hassles.

"concat_ws" and "map" are hive udf and "collect_list" is a hive udaf.

Working

"map" function creates map of values of two columns as key value pairs.
the in outer query, we did group by on dimension columns (id and code) and aggregate all the values of a particular key using "collect_list"

id code p q r t

1 A ["e"] [] [] []

2 B [] ["f"] [] []

3 B ["f"] ["h"] ["j"] []

3 C [] [] [] ["k"]

Then in last outer query, we use "concat_ws" function to get a single string value out of array.

2) When value is of numeric (int/float/double) type
If "test_sample" is hive table with following table definiton:
create table test_sample(id string, code string, key string, value int) row format delimited fields terminated by ',' lines terminated by '\n';

hive> set hive.cli.print.header=true;
hive> select * from test_sample;
id code key value
1 A p 5
2 B q 6
3 B p 6
3 B q 8
3 B r 10
3 C t 11

the query to create transpose of it is:

select id, code, sum(a.group_map['p']) as p, sum(a.group_map['q']) as q, sum(a.group_map['r']) as r, sum(a.group_map['t']) as t from

( select id, code,

map(proc1,proc2) as group_map

from test_sample

) a group by a.id, a.code;

On execution of this query, the output will be:
id code p q r t
-----------------------------------------------------------
1 A 5 NULL NULL NULL
2 B NULL 6 NULL NULL
3 B 6 8 10 NULL
3 C NULL NULL NULL 11

which is the expected output.

17 comments:

Unknown21 February 2016 at 00:02
THis is very useful. But what if i dont want to hard code the column names (in this case there are only 3 columns, but i have more than 70 columns).
ReplyDelete
Replies
Unknown17 August 2016 at 01:24
is there a way to do the oppsit .?
thanks in advance
ReplyDelete
Replies
Unknown27 November 2016 at 23:01
how to pivot row to column dynamically in hive without hard code the column names
ReplyDelete
Replies
Anonymous25 February 2017 at 10:33
THanks much this is working
ReplyDelete
Replies
Unknown15 September 2017 at 07:37
Nice approach, thanks
ReplyDelete
Replies
Unknown12 October 2017 at 08:04
check this link ... just 5 lines query ,, can create pivot table.
using map dat type to convert columns name are keys and values are Value in map ( k,v) then later view explode ( map ) to convert into (k,v) as rows.. what sample.
ReplyDelete
Replies
Unknown12 October 2017 at 08:05
https://stackoverflow.com/questions/37436710/is-there-a-way-to-transpose-data-in-hive
ReplyDelete
Replies
Unknown19 March 2018 at 14:17
any options for dynamic pivoting for n number columns?
ReplyDelete
Replies
Tejuteju6 July 2018 at 06:32
Really Good blog post.provided a helpful information.I hope that you will post more updates like this Big data hadoop online training Bangalore

ReplyDelete
Replies
Unknown16 August 2018 at 08:40
original post

https://stackoverflow.com/questions/23025380/how-to-transpose-pivot-data-in-hive
ReplyDelete
Replies
jenklia28 November 2018 at 22:50
(proc1,proc2), where is it from ?
ReplyDelete
Replies
Unknown23 April 2019 at 23:25
hi how can we pivot the data in hive without hard coding the column names
ReplyDelete
Replies
shashi26 August 2019 at 12:09
this we can easily done with spark data frames
val newdf=df.groupBy("code").pivot("key").sum("value")
ReplyDelete
Replies

Add comment