Clickhouse版本21.6.5.37的分片和副本分布式安装

  • Post author:
  • Post category:其他




1. 基本介绍



1.1 不适用场景

  • 不支持事务
  • 支持但不擅长根据主键查询整行的数据
  • 支持但不擅长按行删除数据
  • 不能支持高并发



1.2 向量化执行

要访问的数据离CPU越近,则计算的速度越快,寄存器是CPU的一个储存系统,当然计算速度最快了

Clickhouse的CPU利用

SSE4.2

指令集操作寄存器来实现一个指令操作多条数据(SIMD-Single Instruction Multiple Data),从而实现向量化执行;该指令不适合用于较多分支判断的场景,应用多线程就可解决



1.3 内存的使用

  • 在内存中group by
  • 使用了L3级别的缓存



2. 安装与部署

适用于22.2.3.5版本,已经过测试



2.1 安装环境

4台centos服务器,配置ssh免密登录;本示例安装2分片、1副本的集群

zookeeper的安装可以参考我的博客:

zookeeper3.6.3分布式安装

hostname 地址 clickhouse服务 zookeeper
clickhouse1 192.168.23.41 shard1 zookeeper
clickhouse2 192.168.23.42 replica1 zookeeper
clickhouse3 192.168.23.43 shard2 zookeeper
clickhouse4 192.168.23.44 replica2

分别检查4台服务器的CPU是否支持SSE 4.2的命令,如下所示:

[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
SSE 4.2 supported
[root@clickhouse1 ~]#



2.2 安装

分别在4台服务器上运行以下命令进行安装

[root@clickhouse1 ~]# wget https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-21.6.5.37.tgz
[root@clickhouse1 ~]# wget https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-dbg-21.6.5.37.tgz
[root@clickhouse1 ~]# wget https://repo.clickhouse.tech/tgz/stable/clickhouse-server-21.6.5.37.tgz
[root@clickhouse1 ~]# wget https://repo.clickhouse.tech/tgz/stable/clickhouse-client-21.6.5.37.tgz

[root@clickhouse1 ~]# tar -zxvf clickhouse-common-static-21.6.5.37.tgz 
[root@clickhouse1 ~]# tar -zxvf clickhouse-common-static-dbg-21.6.5.37.tgz
[root@clickhouse1 ~]# tar -zxvf clickhouse-server-21.6.5.37.tgz
[root@clickhouse1 ~]# tar -zxvf clickhouse-client-21.6.5.37.tgz

[root@clickhouse1 ~]# sh clickhouse-common-static-21.6.5.37/install/doinst.sh
[root@clickhouse1 ~]# sh clickhouse-common-static-dbg-21.6.5.37/install/doinst.sh
[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# sh clickhouse-server-21.6.5.37/install/doinst.sh 
ClickHouse binary is already located at /usr/bin/clickhouse
Symlink /usr/bin/clickhouse-server already exists but it points to /root/clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-client to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-local to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-benchmark to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-copier already exists but it points to /root/clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-copier to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-obfuscator to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-git-import to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-compressor to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-format to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-extract-from-config already exists but it points to /root/clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Creating clickhouse group if it does not exist.
 groupadd -r clickhouse
Creating clickhouse user if it does not exist.
 useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Creating config directory /etc/clickhouse-server/config.d that is used for tweaks of main server configuration.
Creating config directory /etc/clickhouse-server/users.d that is used for tweaks of users configuration.
Config file /etc/clickhouse-server/config.xml already exists, will keep it and extract path info from it.
/etc/clickhouse-server/config.xml has /var/lib/clickhouse/ as data path.
/etc/clickhouse-server/config.xml has /var/log/clickhouse-server/ as log path.
Users config file /etc/clickhouse-server/users.xml already exists, will keep it and extract users info from it.
 chown --recursive clickhouse:clickhouse '/etc/clickhouse-server'
Creating log directory /var/log/clickhouse-server/.
Creating data directory /var/lib/clickhouse/.
Creating pid directory /var/run/clickhouse-server.
 chown --recursive clickhouse:clickhouse '/var/log/clickhouse-server/'
 chown --recursive clickhouse:clickhouse '/var/run/clickhouse-server'
 chown clickhouse:clickhouse '/var/lib/clickhouse/'
Enter password for default user: 
Password for default user is saved in file /etc/clickhouse-server/users.d/default-password.xml.
Setting capabilities for clickhouse binary. This is optional.
Allow server to accept connections from the network (default is localhost only), [y/N]: y
The choice is saved in file /etc/clickhouse-server/config.d/listen.xml.

ClickHouse has been successfully installed.

Start clickhouse-server with:
 sudo clickhouse start

Start clickhouse-client with:
 clickhouse-client --password

Created symlink from /etc/systemd/system/multi-user.target.wants/clickhouse-server.service to /etc/systemd/system/clickhouse-server.service.
[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# sh clickhouse-client-21.6.5.37/install/doinst.sh

这里我们设置了default用户的密码,设置client可以从任何地址访问



2.3 配置

分别在4台服务器进行以下操作



2.3.1 metrika.xml


注意:本示例的集群有副本,表引擎必须为Replicated…MergeTree;如果使用…MergeTree表引擎,则通过Distributed表引擎插入数据,只向shard插入数据,而不会向副本插入数据;当通过Distributed表查询数据时,可能会从shard查询,也可能会从副本查询,导致每次查询数据不一致

新建metrika.xml, 配置权限为644, 内容如下

[root@clickhouse1 config.d]# 
[root@clickhouse1 config.d]# pwd
/etc/clickhouse-server/config.d
[root@clickhouse1 config.d]#
[root@clickhouse1 config.d]# touch metrika.xml
[root@clickhouse1 config.d]#
[root@clickhouse1 config.d]# ll
总用量 4
-rw-r--r--. 1 root root 53 7月   6 16:54 listen.xml
-rw-r--r--. 1 root root  0 7月   6 18:46 metrika.xml
[root@clickhouse1 config.d]#
[root@clickhouse1 config.d]# vi metrika.xml
[root@clickhouse1 config.d]# cat metrika.xml
<?xml version="1.0"?>
<yandex>

    <zookeeper-servers>
        <node>
            <host>clickhouse1</host>
            <port>2181</port>
        </node>
        <node>
            <host>clickhouse2</host>
            <port>2181</port>
        </node>
        <node>
            <host>clickhouse3</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>

    <clickhouse_remote_servers>
        <sharding_ha>   <!-- 自定义集群名称 -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>clickhouse1</host>  <!-- 必须为ifconfig查询到的地址 -->
                    <port>9000</port>
                    <user>default</user>
                    <password>default123</password>
                    <weight>1</weight>
                </replica>
                <replica>
                    <host>clickhouse2</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>default123</password>
                    <weight>1</weight>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>clickhouse3</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>default123</password>
                    <weight>1</weight>
                </replica>
                <replica>
                    <host>clickhouse4</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>default123</password>
                    <weight>1</weight>
                </replica>

            </shard>
        </sharding_ha>
    </clickhouse_remote_servers>


    <!-- clickhouse2服务器为[01, clickhouse2], clickhouse3服务器为[02, clickhouse3], clickhouse4服务器为[02, clickhouse4] -->
    <!-- 建立分布式表时,表引擎引用的宏参数 -->
    <macros_parameter>
        <shard>01</shard>
        <replica>clickhouse1</replica>
    </macros_parameter>



</yandex>

[root@clickhouse1 config.d]#



2.3.2 config.xml

新建clickhouse数据目录

[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# mkdir clickhouse
[root@clickhouse1 ~]# 


编辑/etc/clickhouse-server/config.xml

删除部分


    <remote_servers>
    ......省略部分......
    </remote_servers>


修改部分

    <tcp_port>9000</tcp_port>
    
    <path>/root/clickhouse/</path>
    <tmp_path>/root/clickhouse/tmp/</tmp_path>
    <user_files_path>/root/clickhouse/user_files/</user_files_path>

    <user_directories>
        ......省略部分......
        <local_directory>
            <!-- Path to folder where users created by SQL commands are stored. -->
            <path>/root/clickhouse/access/</path>
        </local_directory>
        ......省略部分......
    <user_directories>

    <format_schema_path>/root/clickhouse/format_schemas/</format_schema_path>

添加部分

    <include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>
    <zookeeper incl="zookeeper-servers" optional="true" />
    <remote_servers incl="clickhouse_remote_servers" optional="true" />
    <macros incl="macros_parameter" optional="true" />
    
    <database_atomic_delay_before_drop_table_sec>0</database_atomic_delay_before_drop_table_sec>




2.4 启动

分别在4台服务器执行以下操作

[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# clickhouse-server --config=/etc/clickhouse-server/config.xml --daemon start
[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# cat /var/log/clickhouse-server/clickhouse-server.log | grep "Ready for connections"
2021.07.07 22:56:14.859761 [ 1692 ] {} <Information> Application: Ready for connections.
[root@clickhouse1 ~]#

如果需要停止,则用命令

clickhouse stop



3. 系统表查询



3.1 zookeeper

clickhouse提供了zookeeper的代理表,可以通过SQL直接查询zookeeper的数据,如下所示:

[root@clickhouse1 ~]#
[root@clickhouse1 ~]# clickhouse-client -h clickhouse1 --port 9000 -u default --password default123 -m -n
ClickHouse client version 21.6.5.37 (official build).
Connecting to clickhouse1:9000 as user default.
Connected to ClickHouse server version 21.6.5 revision 54448.

clickhouse1 :)
clickhouse1 :) select * from system.zookeeper where path = '/';

SELECT *
FROM system.zookeeper
WHERE path = '/'

Query id: 75141092-0cf8-40e5-84f2-9cd6c47194b9

┌─name───────┬─value─┬───────czxid─┬───────mzxid─┬───────────────ctime─┬───────────────mtime─┬─version─┬─cversion─┬─aversion─┬─ephemeralOwner─┬─dataLength─┬─numChildren─┬───────pzxid─┬─path─┐
│ zookeeper  │       │           0 │           0 │ 1970-01-01 08:00:00 │ 1970-01-01 08:00:00 │       0 │       -2 │        0 │              0 │          0 │           2 │           0 │ /    │
│ clickhouse │       │ 21474836482 │ 21474836482 │ 2021-07-07 04:01:31 │ 2021-07-07 04:01:31 │       0 │        1 │        0 │              0 │          0 │           1 │ 21474836483 │ /    │
└────────────┴───────┴─────────────┴─────────────┴─────────────────────┴─────────────────────┴─────────┴──────────┴──────────┴────────────────┴────────────┴─────────────┴─────────────┴──────┘

2 rows in set. Elapsed: 0.023 sec. 

clickhouse1 :)
clickhouse1 :) select name, value, czxid, mzxid from system.zookeeper where path = '/clickhouse';

SELECT
    name,
    value,
    czxid,
    mzxid
FROM system.zookeeper
WHERE path = '/clickhouse'

Query id: fa1e1772-a455-46ae-b5ba-2b70390c9a81

┌─name───────┬─value─┬───────czxid─┬───────mzxid─┐
│ task_queue │       │ 21474836483 │ 21474836483 │
└────────────┴───────┴─────────────┴─────────────┘

1 rows in set. Elapsed: 0.014 sec. 

clickhouse1 :) 



3.2 clusters

clickhouse1 :) 
clickhouse1 :) select cluster, host_name from system.clusters;

SELECT
    cluster,
    host_name
FROM system.clusters

Query id: 8430bc2a-c6fd-4e4d-9b57-8bf38ee6cfad

┌─cluster─────┬─host_name───┐
│ sharding_ha │ clickhouse1 │
│ sharding_ha │ clickhouse2 │
│ sharding_ha │ clickhouse3 │
│ sharding_ha │ clickhouse4 │
└─────────────┴─────────────┘

4 rows in set. Elapsed: 0.010 sec. 

clickhouse1 :) 



3.3 macros

查询clickhouse1

clickhouse1 :)
clickhouse1 :) select * from system.macros;

SELECT *
FROM system.macros

Query id: 8241bdf6-c198-44b7-a0ca-fcda123a9229

┌─macro───┬─substitution─┐
│ replica │ clickhouse1  │
│ shard   │ 01           │
└─────────┴──────────────┘

2 rows in set. Elapsed: 0.007 sec. 

clickhouse1 :)

连接clickhouse1时,远程查询clickhouse2

clickhouse1 :)
clickhouse1 :) select * from remote('clickhouse2:9000', 'system', 'macros', 'default', 'default123');

SELECT *
FROM remote('clickhouse2:9000', 'system', 'macros', 'default', 'default123')

Query id: 6c1882d5-8736-46ec-bfdf-76644a443244

┌─macro───┬─substitution─┐
│ replica │ clickhouse2  │
│ shard   │ 01           │
└─────────┴──────────────┘

2 rows in set. Elapsed: 0.078 sec. 

clickhouse1 :)



版权声明:本文为yy8623977原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。