[Hive] Hive表压缩

Post author:xfxia
Post published:2023年9月23日
Post category:其他

（1）压缩概述

MR

支持的压缩编码，表1

为了支持多种压缩/解压缩算法，Hadoop引入了编码/解码器，表2

压缩性能的比较，表3

注：

表1，表2，表3来自网络。

假如有一个表：

create

table

emp_t(

id

int

,

name

String,

deptno

int

)

row

format delimited

fields terminated

by

‘,’

collection items terminated

by

‘-‘

map keys terminated

by

‘:’

stored

as

textfile;

（2）开启Map输出阶段压缩

开启map输出阶段压缩可以减少job中Map和Reduce task间数据传输量。具体配置如下：

案例实操：

开启hive中间传输数据压缩功能

set hive.exec.compress.intermediate=true;

开启mapreduce中map输出压缩功能

set mapreduce.map.output.compress=true;

设置mapreduce中map输出数据的压缩方式

set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;

执行查询语句

select count(1) as name from emp_t;

（3）开启Reduce输出阶段压缩

当Hive将输出写入到表中时，输出内容同样可以进行压缩。属性hive.exec.compress.output控制着这个功能。用户可能需要保持默认设置文件中的默认值false，这样默认的输出就是非压缩的纯文本文件了。用户可以通过在查询语句或执行脚本中设置这个值为true，来开启输出结果压缩功能。

案例实操：

开启hive最终输出数据压缩功能，默认false

set hive.exec.compress.output=true;

开启mapreduce最终输出数据压缩，默认false

set mapreduce.output.fileoutputformat.compress=true;

设置mapreduce最终数据输出压缩方式,

默认：

mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DeflateCodec

set mapreduce.output.fileoutputformat.compress.codec =org.apache.hadoop.io.compress.SnappyCodec

设置mapreduce最终数据输出压缩为块压缩(

NONE

、RECORD、

BLOCK)

set mapreduce.output.fileoutputformat.compress.type=BLOCK;

刚创建表时，hdfs中没有文件

插入数据后插入数据，文件格式为snappy：

insert into emp_t(id,name,deptno)values(1,’zhangsan’,1);

关闭mapreduce压缩后插入数据，文件格式为textfile：

insert into emp_t(id,name,deptno)values(2,’zhangsan’,1);

（4）创建表时指定压缩格式

创建表时指定压缩格式和通过设置Reduce输出阶段压缩的功能一样。

create

table

emp_t1(

id

int

,

name

String,

deptno

int

)

row

format delimited

fields terminated

by

‘,’

collection items terminated

by

‘-‘

map keys terminated

by

‘:’

STORED AS orc tblproperties (“orc.compress”=”Snappy”);

插入数据：

insert into emp_t1(id,name,deptno)values(2,’zhangsan’,1);

查看HDFS的文件：

原文链接：https://blog.csdn.net/henku449141932/article/details/113558895

（1）压缩概述

（2）开启Map输出阶段压缩

（3）开启Reduce输出阶段压缩

（4）创建表时指定压缩格式

你可能也喜欢