CLICKHOUSE中JSON解析
Clickhouse一般使用字符串行式保存JSON数据,clickhouse提供了JSON函数,方便我们提取JSON数据。
如要提取CONTENT字符串中保存的JSON数据,DATA的所有字段。
{
"SENDER":"operation",
"TABLE":"team_goal",
"TIME":1675844952009,
"DATA":{
"data":{
"_id":"63d78bb319cef40007220f45",
"appId":"62e9d317c653b800076fc089",
"createTime":"2023-01-30T09:19:47.924Z",
"creator":{
"_id":"5c9c6562c9c631ec0899fbe8",
"name":"李雷",
"status":1,
"type":0,
"username":"1551345709528115"
},
"deleteTime":null,
"deleter":null,
"entryId":"6350bed4ff61f1000a167f33",
"formName":"任务",
"goal_amount":8200000,
"goal_channel":"销售",
"goal_date":"2023-11-30T16:00:00.000Z",
"team_code":"100009",
"team_name":"北京分公司",
"updateTime":"2023-02-08T08:29:11.892Z",
"updater":{
"_id":"628bd2ca2aff520007a6b39d",
"name":"韩梅梅",
"status":1,
"type":0,
"username":"264125661026134153"
}
},
"op":"data_update"
}
}
该JSON嵌套了4层,我们应该怎么样去提取?从结构上看,可以利用JSONExtractRaw函数将DATA提取出来。
JSONExtractRaw(json[, indices_or_keys]…)
JSONExtractRaw函数可以将一部分 JSON 作为未分析的字符串返回。如果该部件不存在或类型错误,则将返回空字符串。
WITH JSONExtractRaw(replaceAll(CONTENT, '\n', '<br/>'), 'DATA') as DATA
SELECT
DATA
FROM default.test;
提取结果:
{
"data":{
"_id":"63d78bb319cef40007220f45",
"appId":"62e9d317c653b800076fc089",
"createTime":"2023-01-30T09:19:47.924Z",
"creator":{
"_id":"5c9c6562c9c631ec0899fbe8",
"name":"李雷",
"status":1,
"type":0,
"username":"1551345709528115"
},
"deleteTime":null,
"deleter":null,
"entryId":"6350bed4ff61f1000a167f33",
"formName":"任务",
"goal_amount":8200000,
"goal_channel":"销售",
"goal_date":"2023-11-30T16:00:00.000Z",
"team_code":"100009",
"team_name":"北京分公司",
"updateTime":"2023-02-08T08:29:11.892Z",
"updater":{
"_id":"628bd2ca2aff520007a6b39d",
"name":"韩梅梅",
"status":1,
"type":0,
"username":"264125661026134153"
}
},
"op":"data_update"
}
此时可以看到已经将DATA从CONTENT中提取出来,同理可以将data从DATA中提取出来
{
"_id":"63d78bb319cef40007220f45",
"appId":"62e9d317c653b800076fc089",
"createTime":"2023-01-30T09:19:47.924Z",
"creator":{
"_id":"5c9c6562c9c631ec0899fbe8",
"name":"李雷",
"status":1,
"type":0,
"username":"1551345709528115"
},
"deleteTime":null,
"deleter":null,
"entryId":"6350bed4ff61f1000a167f33",
"formName":"任务",
"goal_amount":8200000,
"goal_channel":"销售",
"goal_date":"2023-11-30T16:00:00.000Z",
"team_code":"100009",
"team_name":"北京分公司",
"updateTime":"2023-02-08T08:29:11.892Z",
"updater":{
"_id":"628bd2ca2aff520007a6b39d",
"name":"韩梅梅",
"status":1,
"type":0,
"username":"264125661026134153"
}
}
接下来可以对data中的每个字段进行提取。在此之前先来说说CLICKHOUSE中常见的几个JSON提取函数。
–1.抽取JSON中返回值为String类型的值
SELECT
JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') AS a;
返回
–2.抽取JSON中返回值为Float类型的值
SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300],"f":1.01}', 'f') as f;
返回
–3.抽取JSON中返回值为Int类型的值
SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}','i') as i;
返回
–4.抽取JSON中返回值为UInt类型的值
SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":-100}','i') as i0,
JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}','i') as i;
返回
这些函数也可以换个写法,例如
–提取字段类型为String
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'a', 'String') AS a
返回
–提取字段类型为Float
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"f":1.01}', 'f','Float') as f
返回
–提取字段类型为UInt
SELECT
JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"i":-100}', 'i', 'UInt64') AS i0,
JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}', 'i') AS i
返回
CLICKHOUSE中,没有提供Decimal类型的JSON提取函数,当要提取的字段类型为Decimal时,可以这样写
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"f":1.0111111}', 'f', 'Decimal(18,2)') AS f
返回
即第三个参数直接指定该字段的类型。
下面开始提取data中的字段。
WITH JSONExtractRaw(JSONExtractRaw(replaceAll(CONTENT, '\n', '<br/>'), 'DATA'), 'data') AS data
SELECT
JSONExtractString(data, '_id') AS _id,
JSONExtractString(JSONExtractRaw(replaceAll(CONTENT, '\n', '<br/>'), 'DATA'), 'op') AS _op,
addHours(toDateTime64(replaceAll(JSONExtractString(data, 'deleteTime'), 'Z', ''), 3), 8) AS _delete_time,
if(toYear(_delete_time) > 1970, _delete_time, addHours(toDateTime64(replaceAll(JSONExtractString(data, 'updateTime'), 'Z', ''), 3), 8)) AS _update_time,
if(empty(JSONExtract(data, 'team_name', 'Nullable(String)')) = 0, JSONExtract(data, 'team_name', 'Nullable(String)'), NULL) AS team_name,
toInt64(JSONExtractString(data, 'team_code')) AS team_code,
addHours(toDateTime64(replaceAll(JSONExtractString(data, 'goal_date'), 'Z', ''), 3), 8) AS goal_date,
JSONExtractString(data, 'goal_channel') AS goal_channel,
toDecimal64(JSONExtractFloat(data, 'goal_amount'), 2) AS goal_amount,
JSONExtractRaw(data, 'creator') AS creator,
JSONExtractString(creator, '_id') AS fill_id,
JSONExtractString(creator, 'name') AS fill_name,
JSONExtractString(data, 'formName') AS formName,
addTime
FROM default.test
提取结果