大数据集群组件参数化

集群组件参数化

一、HDFS

1.1、namenode 内存

HDFS namenode 内存,因为元数据信息是存在内存中的,默认是 1G,建议改为 36G, 36864

1.2、datanode 内存

HDFS datanode内存,默认是 1G,建议改为 4GB

1.3、fs.trash.interval

fs.trash.interval=10080 单位：分钟

1.4、dfs.datanode.du.reserved

374820962816 350GB

1.5、dfs.namenode.handler.count

NameNode Server threads

400

二、YARN

2.1、Memory allocated for all YARN containers on a node

72GB

2.2、Container

Minimum Container Size (Memory)

yarn.scheduler.minimum-allocation-mb

1GB

Maximum Container Size (Memory)

yarn.scheduler.maximum-allocation-mb

8GB

Maximum Container Size (VCores)

8

2.3、MapReduce2 Map Memory

3072->4096

2.4、Reduce Memory

6144->4096

2.5、Sort Allocation Memory

2047->1536

三、Hbase

3.1、Hbase Master maximum Memory

1024->36864

3.2、hbase regionserver 堆内存

由于 hbase 本身是很吃内存的,建议 20G(ambari已经是 20 G了，没动),默认 20 G

3.3、 Number of Handlers per RegionServer

30->60

hbase.regionserver.handler.count：rpc请求的线程数量，默认值是 30，生产环境建议使用 60，也不是越大越好，特别是当请求内容很大的时候，比如scan/put几M的数据，会占用过多的内存，有可能导致频繁的GC，甚至出现内存溢出。

3.4、Maximum Region File Size

10GB->50GB

3.5、hbase.hregion.majorcompaction

默认值为 7 天->0 即禁止自动的 major 主合并，主合并能持续数小时之久，为减少对业务的影响，建议在业务低峰期进行手动或者通过脚本或者 api 定期进行 major 合并。

3.6、Memstore Flush Size

hbase.hregion.memstore.flush.size：默认值128M，默认值即可。

3.7、hbase.hregion.memstore.block.multiplier

默认值 4，如果一个 memstore 的内存大小已经超过 hbase.hregion.memstore.flush.size * hbase.hregion.memstore.block.multiplier，则会阻塞该 memstore 的写操作，为避免阻塞，建议设置为更大些，如果太大，则会有 OOM 的风险。

如果在 regionserver 日志中出现 “Blocking updates for ‘‘ on region : memstore size <多少M> is >= than blocking <多少M> size”的信息时，说明这个值该调整了。

3.8、hbase.hstore.compaction.min：

默认值为 3，如果任何一个 store 里的 storefile 总数超过该值，会触发默认的合并操作，可以设置 5~8，在手动的定期 major compact 中进行 storefile 文件的合并，减少合并的次数，不过这会延长合并的时间，以前的对应参数为 hbase.hstore.compactionThreshold。

3.9、hbase.hstore.compaction.max：

默认值为 10,一次最多合并多少个 storefile，避免 OOM。

3.10、hbase.hstore.blockingStoreFiles：

默认为 10，如果任何一个 store (非.META.表里的 store)的 storefile 的文件数大于该值，则在 flush memstore 前先进行 split 或者 compact，同时把该 region 添加到 flushQueue，

延时刷新，这期间会阻塞写操作直到 compact 完成或者超过 hbase.hstore.blockingWaitTime (默认 90s)配置的时间，可以设置为 30，避免 memstore 不及时 flush。当 regionserver 运行日志中出现大量的 “Region has too many store files; delaying flush up to 90000ms”时，说明这个值需要调整了

3.11、修改选项

zookeeper.znode.parent 从 /hbase-unsecure 改为 /hbase

hbase.master.info.port 从 16010 改为 60010

hbase.master.port 从 16000 改为 60000

hbase.regionserver.port 从 16020 改为 60020

hbase.regionserver.info.port 从 16030 改为 60030

3.12、zookeeper.session.timeout

默认为 90 秒

如果太长，当 regionserver 挂掉，zk 还得等待这个超时时间(已有 patch 修复)，从而导致 master 不能及时对 region 进行迁移。

四、zookeeper 配置

4.1、Zookeeper Server Maximum Memory

1024->8096

4.2、maxClientCnxns

用 zookeeper 用户登录

vi /etc/zookeeper/2.6.5.0-292/0/zoo.cfg

maxClientCnxns=300

4.3、maxSessionTimeout

而 minSessionTimeout 和 maxSessionTimeout 是用下面的方式算出来的

tickTime=3000 毫秒

public int getMinSessionTimeout()

{ return minSessionTimeout == -1 ? tickTime

2 : minSessionTimeout; }

public int getMaxSessionTimeout()

{ return maxSessionTimeout == -1 ? tickTime

20 : maxSessionTimeout; }

五、Ambari server 配置

5.1、metrics_collector_heapsize

默认是 512M->4096M

5.2、 On the Ambari Server host, edit the ambari-env.sh file:

1、vi /var/lib/ambari-server/ambari-env.sh
2、For the AMBARI_JVM_ARGS variable, replace the default -Xmx2048m with the

following:

-Xms4096m -Xmx4096m -XX:PermSize=512m -XX:MaxPermSize=512m
3、Restart Ambari Server for this change to take effect.

ambari-server restart