数据文件内容
steven:100;steven:90;steven:99^567^22
ray:90;ray:98^456^30
Tom:81^222^33
期望最终放到数据库的数据格式如下:
steven 100 567 22
steven 90 567 22
steven 99 567 22
ray 90 456 30
ray 98 456 30
Tom 81 222 33
Specifically, if you want to return a different number of columns, or a different number of rows for a given input row, then yu need to perform what hive calls a transform.
1.创建表存储原始数据
create table u_data(col1 string, code int, age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^' STORED AS TEXTFILE;
2.加载数据
load data local inpath '/home/stevenxia/data1' overwrite into table u_data;
3.编写transform脚本
#!/usr/bin/python import sys for line in sys.stdin:values = line.split()tmp = values[0]key_values = tmp.split(";")for kv in key_values:k = kv.split(":")[0]v = kv.split(":")[1]print '\t'.join([k,v,values[1],values[2]])
4.把脚本部署到node节点, 位置 /home/stevenxia/u.py
5.这样hive就可以使用了
select transform(u.col1, u.code, u.age) using '/home/stevenxia/u.py' as (col1, col2, col3, col4) from (select * from u_data) as u;
运行结果