作者:11 | 来源:互联网 | 2023-09-24 19:08
原始数据:
[{"name":"woods","app_id":"abc123"},{"name":"tiger","app_id":"def456"}]
get_json_object
函数的作用:用来解析json字符串的一个字段:
select get_json_object(flist,'$.filtertype') as filtertype
,get_json_object(flist,'$.filtersubtype')as filtersubtype
,get_json_object(flist,'$.filtername') as filtername
from aaaaaaa
json_tuple
函数的作用:用来解析json字符串中的多个字段
select a.flist
,b.filtertype
,b.filtersubtype
,b.filtername
from aaaaaaaaa a
lateral view json_tuple(flist,'filtertype', 'filtersubtype', 'filtername') b as
hive用法
没有别的好办法,肯定提取“[]”之间的json字符串之后、后进行分割,逗号分割不行,逗号太多,而我们只想两个大括号之间的逗号做分割“ },{”,所以要把逗号替换成json数组中不存在的其它字符,之后再进行分割,如下,五步走:
1、regexp_extract提取[]内的json字符串
2、regexp_replace替换为逗号为自定义分隔符
3、split根据自定义分隔符分割
4、explode行转列函数
5、get_json_object解析json取字段值
SELECT
get_json_object(str_json, '$.sku') AS sid
FROM
(
SELECT
event_param_json,
split(regexp_replace(regexp_extract(event_param_json, '(\\[)(.*?)(\\])', 2), '\\},\\{', '\\}#\\{'), '\\#') AS json_list
FROM
gdm.gdm_m14_wireless_exposure_log
WHERE
dt = '2020-09-06'
AND event_id = 'NewManChannel_RecProExpo'
AND event_param_json LIKE '%uss%'
)
a lateral VIEW explode(json_list) list_tab AS str_json
或者如下SQL都可以(两种SQL只是提取中括号[]间json字符串的正则不一样)
SELECT
GET_JSON_OBJECT(json_str, '$.sku') AS sid
FROM
(
SELECT
json_str
FROM
(
SELECT
split(regexp_replace(regexp_extract(event_param_json -- 获取data数组,格式[{json},{json}]
, '^\\[(.+)\\]$', 1) -- 删除字符串前后的[],格式{json},{json}
, '\\}\\,\\{', '\\}\\|\\|\\{') -- 将josn字符串中的分隔符代换成||,格式{json}||{json}
, '\\|\\|') AS json_list
FROM
gdm.gdm_m14_wireless_exposure_log
WHERE
dt = '2020-09-06'
AND page_id = 'NewManChannel_Home'
-- and event_id = 'NewManChannel_RecProExpo'
AND event_param_json LIKE '%uss%'
)
a lateral VIEW explode(json_list) list_tab AS json_str
)
t