hiveserver2详解 – 小飞侠

一 HiveServer2概览

HiveServer2是一个能使客户端针对hive执行查询的一种服务，与HiverServer1比较，它能够支持多个客户端的并发请求和授权的；

HiveCLI 和 hive –e的方式比较单一，HS2允许远程客户端使用多种语言诸如Java,Python等向Hive提交请求，然后取回结果

HS2对于TCP 模式使用TThreadPoolServer，对于HTTP模式使用JettyServer.

TThreadPoolServer为每一个TCP连接分配一个工作者线程，每一个线程总是和一个连接关联，即使该连接是空闲的，所以这儿有个潜在的性能问题：如果有很多连接，将会导致大量的线程。以后可能会换成TThreadedSelectorServer

对于HTTP模式，在客户端和服务器之间需要一个代理，主要是负载均衡或者其他原因，比如HAProxy

二 HiveServer2的参数配置

hive.server2.transport.modeHS2的模式:binary或者http

hive.server2.thrift.http.port:http监听端口，默认10001

hive.server2.thrift.http.min.worker.threadshttp模式下最小工作者线程

hive.server2.thrift.http.max.worker.threadshttp模式下最大工作者线程

hive.server2.thrift.worker.keepalive.time空闲的工作者线程存活时间

如果工作者线程超过最小工作者线程的时候，那些空闲的工作者线程在过了这个存活时间就会被kill掉

hive.server2.thrift.max.message.size：HS2可以接收最大的消息大小

hive.server2.authenticationHS2的授权机制：

NONE:不进行授权检查

LDAP:基于LDAP的授权机制

KERBEROS:基于KERBEROS的授权机制

CUSTOM:定制的授权提供者

PAM:插件式授权模块

hive.server2.thrift.portTCP监听端口，默认10000

hive.server2.thrift.min.worker.threadstcp模式下最小工作者线程，默认是5

hive.server2.thrift.max.worker.threadstcp模式下最大工作者线程，默认是500

hive.server2.thrift.bind.hostTCP绑定主机,默认为localhost

hive.server2.thrift.http.max.idle.timeHTTP模式下工作者线程的空闲时间

hive.server2.thrift.http.worker.keepalive.timeHTTP模式下工作者线程存活时间

hive.server2.async.exec.threads：线程池最多允许多少并发，默认50个

三 HiveServer2的客户端

3.1beeline

3.1.1beeline作为脚本使用

如果通过脚本执行，我们可以直接通过-u -n-p -e等参数让其执行我们的操作，参数选项：

-u<database URL> 要连接的数据库的URL

-n<username> 指定要连接数据库的用户名

-p<password> 指定要连接数据库的密码

-d<driver class> 使用的驱动名字

-i<init file> 初始化的脚本文件

-e<query> 要执行的查询

-f<exec file> 要执行的脚本文件

-w/–password-file<password file> 从文件读取密码

–hiveconf<key=value> 指定hive配置文件

–hivevarname=value 设置hive变量

–showHeader=[true/false]控制是否在查询结果显示列名

–autoCommit=[true/false]是否自动提交事务

–force=[true/false]运行脚本出错是否继续

–outputformat=[table/vertical/csv2/tsv2/dsv] 指定结果展示的格式

例子：

beeline-u jdbc:hive2://hadoop-all-02:10000 -n hadoop -p hadoop -e ‘USE hadoop;SELECT *FROM emp’

或者beeline-u jdbc:hive2://hadoop-all-02:10000 -n hadoop -p hadoop -e ‘SELECT * FROMhadoop.emp’

beeline-u jdbc:hive2://hadoop-all-02:10000 -n hadoop -p hadoop -e ‘SELECT * FROMhadoop.emp’ \

-dorg.apache.hive.jdbc.HiveDriver

beeline-u jdbc:hive2://hadoop-all-02:10000 -n hadoop -p hadoop -f ‘/opt/shell/hs2.sql’

3.1.2 beeline作为命令行使用

!connect 打开数据库一个新的连接

!close 关闭当前数据库的连接

!closeall 关闭当前打开的所有连接

!columns 列出指定表的所有列

!commit 提交当前事务

!describe 描述一张表

!dropall 删除当前数据库所有表

!indexes 列出指定表的索引

!list 列出当前的连接

!outputformat 设置输出格式

!procedures 列出所有的存储过程

!properties 根据指定的属性文件连接数据库

!quit 退出程序

!rollback 回滚事务

!run 根据指定的文件执行脚本

!set 设置一个变量

!sh 执行一个Linux shell命令

!tables 列出数据库所有的表

例子：

!connectjdbc:hive2://hadoop-all-02:10000 hadoop hadoop

!shrm -rf /opt/shell

3.2JDBC

3.2.1JDBC Connection URL TCP 模式格式

jdbc:hive2://<host1>:<port1>,<host2>:<port2>/dbName;initFile=<file>;sess_var_list?hive_conf_list#hive_var_list

它可以跟多个hiveserver2的实例，如果有多个以逗号分割

3.2.2JDBC Connection URL HTTP 模式格式

jdbc:hive2://<host>:<port>/<db>;transportMode=http;httpPath=<http_endpoint>

http_endpoint： hive-site.xml配置的http 端点，默认是cliservice

默认端口10001

CodeExample:

public

class

HiveJDBCTools {

public

static

final

String

HIVE_DRIVER

=

“org.apache.hive.jdbc.HiveDriver”

;

public

static

final

String

HIVE_URL

=

“jdbc:hive2://hadoop-all-02:10000/hadoop”

;

public

static

final

String

USERNAME

=

“hadoop”

;

public

static

final

String

PASSWORD

=

“hadoop”

;

public

static

Connection

getConnection

() {

try

{

Class.

forName

(

HIVE_DRIVER

);

return

DriverManager.

getConnection

(

HIVE_URL

,

USERNAME

,

PASSWORD

);

}

catch

(ClassNotFoundException

e

) {

//

TODO

Auto-generated catch block

e

.

printStackTrace

();

}

catch

(SQLException

e

) {

//

TODO

Auto-generated catch block

e

.

printStackTrace

();

}

return

null

;

}

public

static

PreparedStatement

prepare

(

Connection

conn

, String

sql

) {

try

{

return

conn

.

prepareStatement

(

sql

);

}

catch

(SQLException

e

) {

//

TODO

Auto-generated catch block

e

.

printStackTrace

();

}

return

null

;

}

public

static

void

close

(

ResultSet

rs

,

PreparedStatement

ps

,

Connection

conn

) {

try

{

if

(

rs

!=

null

) {

rs

.

close

();

}

if

(

ps

!=

null

) {

ps

.

close

();

}

if

(

conn

!=

null

) {

conn

.

close

();

}

}

catch

(SQLException

e

) {

//

TODO

Auto-generated catch block

e

.

printStackTrace

();

}

System.

out

.

println

(

“ResourceClosed!!”

);

}

public

class

HiveQueryTools {

public

static

void

query

(String

sql

){

Connection

conn

= HiveJDBCTools.

getConnection

();

if

(

conn

==

null

) {

return

;

}

PreparedStatement

ps

= HiveJDBCTools.

prepare

(

conn

,

sql

);

if

(

ps

==

null

) {

return

;

}

ResultSet

rs

=

null

;

try

{

rs

=

ps

.

executeQuery

();

int

columns

=

rs

.

getMetaData

().

getColumnCount

();

while

(

rs

.

next

()){

for

(

int

i

=

0

;

i

<

columns

;

i

++) {

System.

out

.

println

(

rs

.

getString

(

i

+

1

));

System.

out

.

println

(

“\t”

);

}

}

catch

(SQLException

e

) {

//

TODO

Auto-generated catch block

e

.

printStackTrace

();

}

HiveJDBCTools.

close

(

rs

,

ps

,

conn

);

}

public

static

void

main

(String[]

args

) {

String

sql

=

“SELECT e.ename,e.job,d.dname,d.loc FROM emp e JOINdept d ON e.deptno = d.deptno”

;

HiveQueryTools.

query

(

sql

);

}

package

com.hive.client;

import

java.sql.Connection;

import

java.sql.DriverManager;

import

java.sql.ResultSet;

import

java.sql.SQLException;

import

java.sql.Statement;

public

class

HiveJdbcClient {

private

static

String

driverName

=

“org.apache.hive.jdbc.HiveDriver”

;

/**

*

@param

args

*

@throws

SQLException

public

static

void

main

(String[]

args

)

throws

SQLException {

try

{

Class.

forName

(

driverName

);

}

catch

(ClassNotFoundException

e

) {

//

TODO

Auto-generated catch block

e

.

printStackTrace

();

System.

exit

(

1

);

}

// replace “hive” here with thename of the user the queries should run

// as

Connection

con

= DriverManager.

getConnection

(

“jdbc:hive2://hadoop-all-02:10000/hadoop”

,

“hadoop”

,

“hadoop”

);

Statement

stmt

=

con

.

createStatement

();

String

tableName

=

“testHiveDriverTable”

;

stmt

.

execute

(

“droptable if exists ”

+

tableName

);

stmt

.

execute

(

“createtable ”

+

tableName

+

” (key int, value string)”

);

// show tables

String

sql

=

“show tables ‘”

+

tableName

+

“‘”

;

System.

out

.

println

(

“Running:”

+

sql

);

ResultSet

res

=

stmt

.

executeQuery

(

sql

);

if

(

res

.

next

()) {

System.

out

.

println

(

res

.

getString

(

1

));

}

// describe table

sql

=

“describe ”

+

tableName

;

System.

out

.

println

(

“Running:”

+

sql

);

res

=

stmt

.

executeQuery

(

sql

);

while

(

res

.

next

()) {

System.

out

.

println

(

res

.

getString

(

1

) +

“\t”

+

res

.

getString

(

2

));

}

// load data into table

// NOTE:

filepath

has to be local tothe hive server

// NOTE: /

tmp

/a.txt is a

ctrl

-Aseparated file with two fields per line

String

filepath

=

“/tmp/a.txt”

;

sql

=

“load data local inpath ‘”

+

filepath

+

“‘into table ”

+

tableName

;

System.

out

.

println

(

“Running:”

+

sql

);

stmt

.

execute

(

sql

);

// select * query

sql

=

“select * from ”

+

tableName

;

System.

out

.

println

(

“Running:”

+

sql

);

res

=

stmt

.

executeQuery

(

sql

);

while

(

res

.

next

()) {

System.

out

.

println

(String.

valueOf

(

res

.

getInt

(

1

)) +

“\t”

+

res

.

getString

(

2

));

}

// regular hive query

sql

=

“select count(1) from ”

+

tableName

;

System.

out

.

println

(

“Running:”

+

sql

);

res

=

stmt

.

executeQuery

(

sql

);

while

(

res

.

next

()) {

System.

out

.

println

(

res

.

getString

(

1

));

}

原文链接：https://blog.csdn.net/zhanglh046/article/details/78572926

你可能也喜欢