创建数据库

create database if not exists myhive;
use  myhive;

说明：创建数据库在hdfs上的位置

hive-site.xml

当中的一个属性指定的

<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>

创建并指定hdfs存储位置

create database myhive2 location '/myhive2';

修改数据库（元数据一般不修改，这里只修改创建数据库的时间）

alter database myhive2 set dbproperties(‘createtime’=‘20180611’);

查看数据库的详细信息

desc database myhive2;

查看数据库更多详细信息

desc database extended myhive2;

删除数据库

drop database myhive2;（删除空的数据库）

drop database myhive cascade; （强制删除）

创建表的语法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

[(col_name data_type [COMMENT col_comment], …)]

[COMMENT table_comment]

[PARTITIONED BY (col_name data_type [COMMENT col_comment], …)]

[CLUSTERED BY (col_name, col_name, …)

[SORTED BY (col_name [ASC|DESC], …)] INTO num_buckets BUCKETS]

[ROW FORMAT row_format]

[STORED AS file_format]

[LOCATION hdfs_path]

注释：

1，EXTERNAL关键字可以让用户创建一个外部表

2，LOCATION指定hdfs路径

3，PARTITIONED BY创建分区表，将不同数据放到不同的文件夹下

4，CLUSTERED BY 创建分桶表，将数据按照规则放到不同的文件下

5，STORED AS 指定存储格式 SEQUENCEFILE|TEXTFILE|RCFILE

6，ROW FORMAT 指定分割符

四种表模型

1，管理表，内部表，删除表的时候，hdfs上的数据也删除了

创建表并指定字段之间的分隔符

create table if not exists stu(id int,name string) row format delimited fields terminated by ‘\t’ stored as textfile location ‘/hive2/stu’;

根据查询结果创建表

create table stu3 as select * from stu2;

根据已经存在的表结构创建表

create table stu4 like stu2;

查询表的类型

desc formatted stu2;

2，外部表删除表的时候，不删除 hdfs的数据，

创建内部表

create external table teacher(t_id string,t_name string) row format delimited fields terminated by ‘\t’;

从本地加载

load data local inpath ‘/export/servers/hivedata/student.csv’ into table stu;

从hdfs上加载

从hdfs文件系统向表中加载数据（需要提前将数据上传到hdfs文件系统，其实就是一个移动文件的操作）

cd /export/servers/hivedatas

hdfs dfs -mkdir -p /hivedatas

hdfs dfs -put techer.csv /hivedatas/

load data inpath ‘/hivedatas/techer.csv’ into table techer;

3，分区表将数据放到不同的文件加下

创建分区表语法

create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by ‘\t’;

创建一个表带多个分区

create table score2 (s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by ‘\t’;

加载数据到分区表中

load data local inpath ‘/export/servers/hivedatas/score.csv’ into table score partition (month=‘201806’);

加载数据到一个多分区的表中去

load data local inpath ‘/export/servers/hivedatas/score.csv’ into table score2 partition(year=‘2018’,month=‘06’,day=‘01’);

查看分区

show partitions score;

添加一个分区

alter table score add partition(month=‘201805’);

同时添加多个分区

alter table score add partition(month=‘201804’) partition(month = ‘201803’);

注意：添加分区之后就可以在hdfs文件系统当中看到表下面多了一个文件夹

删除分区

alter table score drop partition(month = ‘201806’);

外部分区表综合练习：

需求描述：现在有一个文件score.csv文件，存放在集群的这个目录下/scoredatas/month=201806，这个文件每天都会生成，存放到对应的日期文件夹下面去，文件别人也需要公用，不能移动。需求，创建hive对应的表，并将数据加载到表中，进行数据统计分析，且删除表之后，数据不能删除

需求实现:

数据准备：

hdfs dfs -mkdir -p /scoredatas/month=201806

hdfs dfs -put score.csv /scoredatas/month=201806/

创建外部分区表，并指定文件数据存放目录

create external table score4(s_id string, c_id string,s_score int) partitioned by (month string) row format delimited fields terminated by ‘\t’ location ‘/scoredatas’;

进行表的修复,说白了就是建立我们表与我们数据文件之间的一个关系映射

msck repair table score4;

修复成功之后即可看到数据已经全部加载到表当中去了

4，分桶表将数据放到不同的文件下

注释：

分桶表不能直接加载数据，必须通过insert overwrite方式

第一步：开启分桶表：

set hive.enforce.bucketing=true;

第二步：设置MapReduce的个数

set mapreduce.job.reduces=3;

创建分桶表：

create table course(c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by ‘\t’;

插入数据步骤：

创建一张普通表：

load data local inpath ‘/export/servers/hivedata/course.csv’ into table course_common;

普通表中加载数据

load data local inpath ‘/export/servers/hivedatas/course.csv’ into table course_common;

通过insert overwrite给桶表中加载数据

insert overwrite table course select * from course_common cluster by(c_id);

此时在hdfs上查看course这张表下就有三个文件

修改表

（1）查询表结构

desc score5;

（2）添加列

alter table score5 add columns (mycol string, mysco string);

（3）查询表结构

desc score5;

（4）更新列

alter table score5 change column mysco mysconew int;

（5）查询表结构

desc score5;

删除表

drop table score；

数据加载

load data

insert overwirte table xxx select * from xx;

通过查询插入数据

通过load方式加载数据

load data local inpath ‘/export/servers/hivedatas/score.csv’ overwrite into table score partition(month=‘201806’);

通过查询方式加载数据

create table score4 like score;

insert overwrite table score4 partition(month = ‘201806’) select s_id,c_id,s_score from score;

导出

hive表中的数据导出（了解）

将hive表中的数据导出到其他任意目录，例如linux本地磁盘，例如hdfs，例如mysql等等

insert导出

1）将查询的结果导出到本地

insert overwrite local directory ‘/export/servers/exporthive’ select * from score;

2）将查询的结果格式化导出到本地

insert overwrite local directory ‘/export/servers/exporthive’ row format delimited fields terminated by ‘\t’ collection items terminated by ‘#’ select * from student;

3）将查询的结果导出到HDFS上(没有local)

insert overwrite directory ‘/export/servers/exporthive’ row format delimited fields terminated by ‘\t’ collection items terminated by ‘#’ select * from score;

Hadoop命令导出到本地

dfs -get /export/servers/exporthive/000000_0 /export/servers/exporthive/local.txt;

hive shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）

bin/hive -e “select * from myhive.score;” > /export/servers/exporthive/score.txt

export导出到HDFS上

export table score to ‘/export/exporthive/score’;

原文链接：https://blog.csdn.net/qq_25534101/article/details/115189732