hive sql 处理数据的时候经常会有行转列,列转行的需求。废话不多说,上代码。
hive> select * from test limit 10;
OK
test.user_id test.order_id
104399 1715131
104399 2105395
104399 1758844
104399 981085
104399 2444143
104399 1458638
104399 968412
104400 1609001
104400 2986088
104400 1795054
列传行–> 把相同user_id的order_id按照逗号转为一行
select user_id,
concat_ws(',',collect_list(order_id)) as order_value
from test
group by user_id
limit 10;
//结果(简写)
user_id order_value
104399 1715131,2105395,1758844,981085,2444143
说明:
使用函数:concat_ws(‘,’,collect_set(column)) ;collect_list 不去重,collect_set 去重。 column的数据类型要求是string
行转列 –> 把一行的数据按某个分隔符分割后形成多行
测试数据如下:
hive> select * from test;
OK
test.user_id test.order_value
104408 2909888,2662805,2922438,674972,2877863,190237
104407 2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128
104406 1463273,2351480,1958037,2606570,3226561,3239512,990271,1436056,2262338,2858678
104405 153023,2076625,1734614,2796812,1633995,2298856,2833641,3286778,2402946,2944051,181577,464232
104404 1815641,108556,3110738,2536910,1977293,424564
104403 253936,2917434,2345879,235401,2268252,2149562,2910478,375109,932923,1989353
104402 3373196,1908678,291757,1603657,1807247,573497,1050134,3402420
104401 814760,213922,2008045,3305934,2130994,1602245,419609,2502539,3040058,2828163,3063469
104400 1609001,2986088,1795054,429550,1812893
104399 1715131,2105395,1758844,981085,2444143,1458638,968412
Time taken: 0.065 seconds, Fetched: 10 row(s)
将order_value的每条记录切割为单元素
select user_id,order_value,order_id
from test
lateral view explode(split(order_value,',')) num as order_id
limit 10;
//结果
user_id order_value order_id
104408 2909888,2662805,2922438,674972,2877863,190237 2909888
104408 2909888,2662805,2922438,674972,2877863,190237 2662805
104408 2909888,2662805,2922438,674972,2877863,190237 2922438
104408 2909888,2662805,2922438,674972,2877863,190237 674972
104408 2909888,2662805,2922438,674972,2877863,190237 2877863
104408 2909888,2662805,2922438,674972,2877863,190237 190237
104407 2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128 2982655
104407 2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128 814964
104407 2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128 1484250
104407 2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128 2323912
Time taken: 0.096 seconds, Fetched: 10 row(s)