目录
1. 基本介绍
1.1 不适用场景
- 不支持事务
- 支持但不擅长根据主键查询整行的数据
- 支持但不擅长按行删除数据
- 不能支持高并发
1.2 向量化执行
要访问的数据离CPU越近,则计算的速度越快,寄存器是CPU的一个储存系统,当然计算速度最快了
Clickhouse的CPU利用
SSE4.2
指令集操作寄存器来实现一个指令操作多条数据(SIMD-Single Instruction Multiple Data),从而实现向量化执行;该指令不适合用于较多分支判断的场景,应用多线程就可解决
1.3 内存的使用
- 在内存中group by
- 使用了L3级别的缓存
2. 安装与部署
适用于22.2.3.5版本,已经过测试
2.1 安装环境
4台centos服务器,配置ssh免密登录;本示例安装2分片、1副本的集群
zookeeper的安装可以参考我的博客:
zookeeper3.6.3分布式安装
hostname | 地址 | clickhouse服务 | zookeeper |
---|---|---|---|
clickhouse1 | 192.168.23.41 | shard1 | zookeeper |
clickhouse2 | 192.168.23.42 | replica1 | zookeeper |
clickhouse3 | 192.168.23.43 | shard2 | zookeeper |
clickhouse4 | 192.168.23.44 | replica2 |
分别检查4台服务器的CPU是否支持SSE 4.2的命令,如下所示:
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
SSE 4.2 supported
[root@clickhouse1 ~]#
2.2 安装
分别在4台服务器上运行以下命令进行安装
[root@clickhouse1 ~]# wget https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-21.6.5.37.tgz
[root@clickhouse1 ~]# wget https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-dbg-21.6.5.37.tgz
[root@clickhouse1 ~]# wget https://repo.clickhouse.tech/tgz/stable/clickhouse-server-21.6.5.37.tgz
[root@clickhouse1 ~]# wget https://repo.clickhouse.tech/tgz/stable/clickhouse-client-21.6.5.37.tgz
[root@clickhouse1 ~]# tar -zxvf clickhouse-common-static-21.6.5.37.tgz
[root@clickhouse1 ~]# tar -zxvf clickhouse-common-static-dbg-21.6.5.37.tgz
[root@clickhouse1 ~]# tar -zxvf clickhouse-server-21.6.5.37.tgz
[root@clickhouse1 ~]# tar -zxvf clickhouse-client-21.6.5.37.tgz
[root@clickhouse1 ~]# sh clickhouse-common-static-21.6.5.37/install/doinst.sh
[root@clickhouse1 ~]# sh clickhouse-common-static-dbg-21.6.5.37/install/doinst.sh
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# sh clickhouse-server-21.6.5.37/install/doinst.sh
ClickHouse binary is already located at /usr/bin/clickhouse
Symlink /usr/bin/clickhouse-server already exists but it points to /root/clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-client to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-local to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-benchmark to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-copier already exists but it points to /root/clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-copier to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-obfuscator to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-git-import to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-compressor to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-format to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-extract-from-config already exists but it points to /root/clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Creating clickhouse group if it does not exist.
groupadd -r clickhouse
Creating clickhouse user if it does not exist.
useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Creating config directory /etc/clickhouse-server/config.d that is used for tweaks of main server configuration.
Creating config directory /etc/clickhouse-server/users.d that is used for tweaks of users configuration.
Config file /etc/clickhouse-server/config.xml already exists, will keep it and extract path info from it.
/etc/clickhouse-server/config.xml has /var/lib/clickhouse/ as data path.
/etc/clickhouse-server/config.xml has /var/log/clickhouse-server/ as log path.
Users config file /etc/clickhouse-server/users.xml already exists, will keep it and extract users info from it.
chown --recursive clickhouse:clickhouse '/etc/clickhouse-server'
Creating log directory /var/log/clickhouse-server/.
Creating data directory /var/lib/clickhouse/.
Creating pid directory /var/run/clickhouse-server.
chown --recursive clickhouse:clickhouse '/var/log/clickhouse-server/'
chown --recursive clickhouse:clickhouse '/var/run/clickhouse-server'
chown clickhouse:clickhouse '/var/lib/clickhouse/'
Enter password for default user:
Password for default user is saved in file /etc/clickhouse-server/users.d/default-password.xml.
Setting capabilities for clickhouse binary. This is optional.
Allow server to accept connections from the network (default is localhost only), [y/N]: y
The choice is saved in file /etc/clickhouse-server/config.d/listen.xml.
ClickHouse has been successfully installed.
Start clickhouse-server with:
sudo clickhouse start
Start clickhouse-client with:
clickhouse-client --password
Created symlink from /etc/systemd/system/multi-user.target.wants/clickhouse-server.service to /etc/systemd/system/clickhouse-server.service.
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# sh clickhouse-client-21.6.5.37/install/doinst.sh
这里我们设置了default用户的密码,设置client可以从任何地址访问
2.3 配置
分别在4台服务器进行以下操作
2.3.1 metrika.xml
注意:本示例的集群有副本,表引擎必须为Replicated…MergeTree;如果使用…MergeTree表引擎,则通过Distributed表引擎插入数据,只向shard插入数据,而不会向副本插入数据;当通过Distributed表查询数据时,可能会从shard查询,也可能会从副本查询,导致每次查询数据不一致
新建metrika.xml, 配置权限为644, 内容如下
[root@clickhouse1 config.d]#
[root@clickhouse1 config.d]# pwd
/etc/clickhouse-server/config.d
[root@clickhouse1 config.d]#
[root@clickhouse1 config.d]# touch metrika.xml
[root@clickhouse1 config.d]#
[root@clickhouse1 config.d]# ll
总用量 4
-rw-r--r--. 1 root root 53 7月 6 16:54 listen.xml
-rw-r--r--. 1 root root 0 7月 6 18:46 metrika.xml
[root@clickhouse1 config.d]#
[root@clickhouse1 config.d]# vi metrika.xml
[root@clickhouse1 config.d]# cat metrika.xml
<?xml version="1.0"?>
<yandex>
<zookeeper-servers>
<node>
<host>clickhouse1</host>
<port>2181</port>
</node>
<node>
<host>clickhouse2</host>
<port>2181</port>
</node>
<node>
<host>clickhouse3</host>
<port>2181</port>
</node>
</zookeeper-servers>
<clickhouse_remote_servers>
<sharding_ha> <!-- 自定义集群名称 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse1</host> <!-- 必须为ifconfig查询到的地址 -->
<port>9000</port>
<user>default</user>
<password>default123</password>
<weight>1</weight>
</replica>
<replica>
<host>clickhouse2</host>
<port>9000</port>
<user>default</user>
<password>default123</password>
<weight>1</weight>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse3</host>
<port>9000</port>
<user>default</user>
<password>default123</password>
<weight>1</weight>
</replica>
<replica>
<host>clickhouse4</host>
<port>9000</port>
<user>default</user>
<password>default123</password>
<weight>1</weight>
</replica>
</shard>
</sharding_ha>
</clickhouse_remote_servers>
<!-- clickhouse2服务器为[01, clickhouse2], clickhouse3服务器为[02, clickhouse3], clickhouse4服务器为[02, clickhouse4] -->
<!-- 建立分布式表时,表引擎引用的宏参数 -->
<macros_parameter>
<shard>01</shard>
<replica>clickhouse1</replica>
</macros_parameter>
</yandex>
[root@clickhouse1 config.d]#
2.3.2 config.xml
新建clickhouse数据目录
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# mkdir clickhouse
[root@clickhouse1 ~]#
编辑/etc/clickhouse-server/config.xml
删除部分
<remote_servers>
......省略部分......
</remote_servers>
修改部分
<tcp_port>9000</tcp_port>
<path>/root/clickhouse/</path>
<tmp_path>/root/clickhouse/tmp/</tmp_path>
<user_files_path>/root/clickhouse/user_files/</user_files_path>
<user_directories>
......省略部分......
<local_directory>
<!-- Path to folder where users created by SQL commands are stored. -->
<path>/root/clickhouse/access/</path>
</local_directory>
......省略部分......
<user_directories>
<format_schema_path>/root/clickhouse/format_schemas/</format_schema_path>
添加部分
<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>
<zookeeper incl="zookeeper-servers" optional="true" />
<remote_servers incl="clickhouse_remote_servers" optional="true" />
<macros incl="macros_parameter" optional="true" />
<database_atomic_delay_before_drop_table_sec>0</database_atomic_delay_before_drop_table_sec>
2.4 启动
分别在4台服务器执行以下操作
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# clickhouse-server --config=/etc/clickhouse-server/config.xml --daemon start
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# cat /var/log/clickhouse-server/clickhouse-server.log | grep "Ready for connections"
2021.07.07 22:56:14.859761 [ 1692 ] {} <Information> Application: Ready for connections.
[root@clickhouse1 ~]#
如果需要停止,则用命令
clickhouse stop
3. 系统表查询
3.1 zookeeper
clickhouse提供了zookeeper的代理表,可以通过SQL直接查询zookeeper的数据,如下所示:
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# clickhouse-client -h clickhouse1 --port 9000 -u default --password default123 -m -n
ClickHouse client version 21.6.5.37 (official build).
Connecting to clickhouse1:9000 as user default.
Connected to ClickHouse server version 21.6.5 revision 54448.
clickhouse1 :)
clickhouse1 :) select * from system.zookeeper where path = '/';
SELECT *
FROM system.zookeeper
WHERE path = '/'
Query id: 75141092-0cf8-40e5-84f2-9cd6c47194b9
┌─name───────┬─value─┬───────czxid─┬───────mzxid─┬───────────────ctime─┬───────────────mtime─┬─version─┬─cversion─┬─aversion─┬─ephemeralOwner─┬─dataLength─┬─numChildren─┬───────pzxid─┬─path─┐
│ zookeeper │ │ 0 │ 0 │ 1970-01-01 08:00:00 │ 1970-01-01 08:00:00 │ 0 │ -2 │ 0 │ 0 │ 0 │ 2 │ 0 │ / │
│ clickhouse │ │ 21474836482 │ 21474836482 │ 2021-07-07 04:01:31 │ 2021-07-07 04:01:31 │ 0 │ 1 │ 0 │ 0 │ 0 │ 1 │ 21474836483 │ / │
└────────────┴───────┴─────────────┴─────────────┴─────────────────────┴─────────────────────┴─────────┴──────────┴──────────┴────────────────┴────────────┴─────────────┴─────────────┴──────┘
2 rows in set. Elapsed: 0.023 sec.
clickhouse1 :)
clickhouse1 :) select name, value, czxid, mzxid from system.zookeeper where path = '/clickhouse';
SELECT
name,
value,
czxid,
mzxid
FROM system.zookeeper
WHERE path = '/clickhouse'
Query id: fa1e1772-a455-46ae-b5ba-2b70390c9a81
┌─name───────┬─value─┬───────czxid─┬───────mzxid─┐
│ task_queue │ │ 21474836483 │ 21474836483 │
└────────────┴───────┴─────────────┴─────────────┘
1 rows in set. Elapsed: 0.014 sec.
clickhouse1 :)
3.2 clusters
clickhouse1 :)
clickhouse1 :) select cluster, host_name from system.clusters;
SELECT
cluster,
host_name
FROM system.clusters
Query id: 8430bc2a-c6fd-4e4d-9b57-8bf38ee6cfad
┌─cluster─────┬─host_name───┐
│ sharding_ha │ clickhouse1 │
│ sharding_ha │ clickhouse2 │
│ sharding_ha │ clickhouse3 │
│ sharding_ha │ clickhouse4 │
└─────────────┴─────────────┘
4 rows in set. Elapsed: 0.010 sec.
clickhouse1 :)
3.3 macros
查询clickhouse1
clickhouse1 :)
clickhouse1 :) select * from system.macros;
SELECT *
FROM system.macros
Query id: 8241bdf6-c198-44b7-a0ca-fcda123a9229
┌─macro───┬─substitution─┐
│ replica │ clickhouse1 │
│ shard │ 01 │
└─────────┴──────────────┘
2 rows in set. Elapsed: 0.007 sec.
clickhouse1 :)
连接clickhouse1时,远程查询clickhouse2
clickhouse1 :)
clickhouse1 :) select * from remote('clickhouse2:9000', 'system', 'macros', 'default', 'default123');
SELECT *
FROM remote('clickhouse2:9000', 'system', 'macros', 'default', 'default123')
Query id: 6c1882d5-8736-46ec-bfdf-76644a443244
┌─macro───┬─substitution─┐
│ replica │ clickhouse2 │
│ shard │ 01 │
└─────────┴──────────────┘
2 rows in set. Elapsed: 0.078 sec.
clickhouse1 :)