CLICKHOUSE中JSON解析

  • Post author:
  • Post category:其他




CLICKHOUSE中JSON解析

Clickhouse一般使用字符串行式保存JSON数据,clickhouse提供了JSON函数,方便我们提取JSON数据。

如要提取CONTENT字符串中保存的JSON数据,DATA的所有字段。

在这里插入图片描述

{
    "SENDER":"operation",
    "TABLE":"team_goal",
    "TIME":1675844952009,
    "DATA":{
        "data":{
            "_id":"63d78bb319cef40007220f45",
            "appId":"62e9d317c653b800076fc089",
            "createTime":"2023-01-30T09:19:47.924Z",
            "creator":{
                "_id":"5c9c6562c9c631ec0899fbe8",
                "name":"李雷",
                "status":1,
                "type":0,
                "username":"1551345709528115"
            },
            "deleteTime":null,
            "deleter":null,
            "entryId":"6350bed4ff61f1000a167f33",
            "formName":"任务",
            "goal_amount":8200000,
            "goal_channel":"销售",
            "goal_date":"2023-11-30T16:00:00.000Z",
            "team_code":"100009",
            "team_name":"北京分公司",
            "updateTime":"2023-02-08T08:29:11.892Z",
            "updater":{
                "_id":"628bd2ca2aff520007a6b39d",
                "name":"韩梅梅",
                "status":1,
                "type":0,
                "username":"264125661026134153"
            }
        },
        "op":"data_update"
    }
}

该JSON嵌套了4层,我们应该怎么样去提取?从结构上看,可以利用JSONExtractRaw函数将DATA提取出来。

JSONExtractRaw(json[, indices_or_keys]…)

JSONExtractRaw函数可以将一部分 JSON 作为未分析的字符串返回。如果该部件不存在或类型错误,则将返回空字符串。

WITH JSONExtractRaw(replaceAll(CONTENT, '\n', '<br/>'), 'DATA') as DATA 
SELECT
	DATA
FROM default.test;

提取结果:

在这里插入图片描述

{
    "data":{
        "_id":"63d78bb319cef40007220f45",
        "appId":"62e9d317c653b800076fc089",
        "createTime":"2023-01-30T09:19:47.924Z",
        "creator":{
            "_id":"5c9c6562c9c631ec0899fbe8",
            "name":"李雷",
            "status":1,
            "type":0,
            "username":"1551345709528115"
        },
        "deleteTime":null,
        "deleter":null,
        "entryId":"6350bed4ff61f1000a167f33",
        "formName":"任务",
        "goal_amount":8200000,
        "goal_channel":"销售",
        "goal_date":"2023-11-30T16:00:00.000Z",
        "team_code":"100009",
        "team_name":"北京分公司",
        "updateTime":"2023-02-08T08:29:11.892Z",
        "updater":{
            "_id":"628bd2ca2aff520007a6b39d",
            "name":"韩梅梅",
            "status":1,
            "type":0,
            "username":"264125661026134153"
        }
    },
    "op":"data_update"
}

此时可以看到已经将DATA从CONTENT中提取出来,同理可以将data从DATA中提取出来

在这里插入图片描述

{
    "_id":"63d78bb319cef40007220f45",
    "appId":"62e9d317c653b800076fc089",
    "createTime":"2023-01-30T09:19:47.924Z",
    "creator":{
        "_id":"5c9c6562c9c631ec0899fbe8",
        "name":"李雷",
        "status":1,
        "type":0,
        "username":"1551345709528115"
    },
    "deleteTime":null,
    "deleter":null,
    "entryId":"6350bed4ff61f1000a167f33",
    "formName":"任务",
    "goal_amount":8200000,
    "goal_channel":"销售",
    "goal_date":"2023-11-30T16:00:00.000Z",
    "team_code":"100009",
    "team_name":"北京分公司",
    "updateTime":"2023-02-08T08:29:11.892Z",
    "updater":{
        "_id":"628bd2ca2aff520007a6b39d",
        "name":"韩梅梅",
        "status":1,
        "type":0,
        "username":"264125661026134153"
    }
}

接下来可以对data中的每个字段进行提取。在此之前先来说说CLICKHOUSE中常见的几个JSON提取函数。

–1.抽取JSON中返回值为String类型的值

SELECT
    JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') AS a;

返回

在这里插入图片描述

–2.抽取JSON中返回值为Float类型的值

SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300],"f":1.01}', 'f') as f;

返回

在这里插入图片描述

–3.抽取JSON中返回值为Int类型的值

SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}','i') as i;

返回

在这里插入图片描述

–4.抽取JSON中返回值为UInt类型的值

SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":-100}','i') as i0,
            JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}','i') as i;

返回

在这里插入图片描述

这些函数也可以换个写法,例如

–提取字段类型为String

SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'a', 'String') AS a

返回

在这里插入图片描述

–提取字段类型为Float

SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"f":1.01}', 'f','Float') as f

返回
在这里插入图片描述

–提取字段类型为UInt

SELECT
    JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"i":-100}', 'i', 'UInt64') AS i0,
    JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}', 'i') AS i

返回

在这里插入图片描述

CLICKHOUSE中,没有提供Decimal类型的JSON提取函数,当要提取的字段类型为Decimal时,可以这样写

SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"f":1.0111111}', 'f', 'Decimal(18,2)') AS f

返回

在这里插入图片描述

即第三个参数直接指定该字段的类型。

下面开始提取data中的字段。

WITH JSONExtractRaw(JSONExtractRaw(replaceAll(CONTENT, '\n', '<br/>'), 'DATA'), 'data') AS data
SELECT
    JSONExtractString(data, '_id') AS _id,
    JSONExtractString(JSONExtractRaw(replaceAll(CONTENT, '\n', '<br/>'), 'DATA'), 'op') AS _op,
    addHours(toDateTime64(replaceAll(JSONExtractString(data, 'deleteTime'), 'Z', ''), 3), 8) AS _delete_time,
    if(toYear(_delete_time) > 1970, _delete_time, addHours(toDateTime64(replaceAll(JSONExtractString(data, 'updateTime'), 'Z', ''), 3), 8)) AS _update_time,
    if(empty(JSONExtract(data, 'team_name', 'Nullable(String)')) = 0, JSONExtract(data, 'team_name', 'Nullable(String)'), NULL) AS team_name,
    toInt64(JSONExtractString(data, 'team_code')) AS team_code,
    addHours(toDateTime64(replaceAll(JSONExtractString(data, 'goal_date'), 'Z', ''), 3), 8) AS goal_date,
    JSONExtractString(data, 'goal_channel') AS goal_channel,
    toDecimal64(JSONExtractFloat(data, 'goal_amount'), 2) AS goal_amount,
    JSONExtractRaw(data, 'creator') AS creator,
    JSONExtractString(creator, '_id') AS fill_id,
    JSONExtractString(creator, 'name') AS fill_name,
    JSONExtractString(data, 'formName') AS formName,
    addTime
FROM default.test

提取结果

在这里插入图片描述



版权声明:本文为zjyy1987原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。