hbase基础命令学习

  • Post author:
  • Post category:其他




HBASE 入门



写在前面

大数据竞赛结束已经有2年了,当时没有考察hbase的知识之后也没了解,导致现在工作遇到相关问题啥也不懂,趁周末赶紧补补课。



一、安装与启动

我用的版本分别为:

hadoop-2.7.3
zookeeper-3.4.10
hbase-1.2.4

虽然之前比赛也有现成的脚本,但是是多节点的,这次单纯学习hbase,单节点就够了,于是上网找了点大佬的博客。

配置可以看这篇:


hbase安装教程

不过上篇文章的启动命令有点问题,我是看下面这篇文章的启动命令成功启动的:


Hbase安装教程详解

启动好后,使用

你的hbase目录/bin/./start-hbase.sh

后运行

你的hbase目录/bin/./hbase shell

进入shell界面就算安装启动成功:

hbase_shell



二、基本命令



环境信息和权限分配

我们输入以下命令可以分别查看hbase的版本、集群状态以及当前用户:

hbase(main):001:0> version
1.2.4, r67592f3d062743907f8c5ae00dbbe1ae4f69e5af, Tue Oct 25 18:10:20 CDT 2016

hbase(main):002:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load

hbase(main):003:0> whoami
root (auth:SIMPLE)
    groups: root
hbase(main):003:0> 

使用如下命令可以赋予权限,系统也会给予一些提示:

grant_username_RWXCA
]

hbase(main):005:0> grant 'root', 'RWXCA'

ERROR: DISABLED: Security features are not available

Here is some help for this command:
Grant users specific rights.
Syntax : grant <user>, <permissions> [, <@namespace> [, <table> [, <column family> [, <column qualifier>]]]

permissions is either zero or more letters from the set "RWXCA".
READ('R'), WRITE('W'), EXEC('X'), CREATE('C'), ADMIN('A')

Note: Groups and users are granted access in the same way, but groups are prefixed with an '@' 
      character. In the same way, tables and namespaces are specified, but namespaces are 
      prefixed with an '@' character.

For example:

    hbase> grant 'bobsmith', 'RWXCA'
    hbase> grant '@admins', 'RWXCA'
    hbase> grant 'bobsmith', 'RWXCA', '@ns1'
    hbase> grant 'bobsmith', 'RW', 't1', 'f1', 'col1'
    hbase> grant 'bobsmith', 'RW', 'ns1:t1', 'f1', 'col1'


hbase(main):006:0>

(ADMIN代表管理权)



表的创建和删除以及查看

  • 创建:官方的建议很不错:
Here is some help for this command:
Creates a table. Pass a table name, and a set of column family
specifications (at least one), and, optionally, table configuration.
Column specification can be a simple string (name), or a dictionary
(dictionaries are described below in main help output), necessarily 
including NAME attribute. 
Examples:

Create a table with namespace=ns1 and table qualifier=t1
  hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}

Create a table with namespace=default and table qualifier=t1
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
  hbase> # The above in shorthand would be the following:
  hbase> create 't1', 'f1', 'f2', 'f3'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
  hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
  
Table configuration options can be put at the end.
Examples:

  hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
  hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
  hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
  hbase> # Optionally pre-split the table into NUMREGIONS, using
  hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
  hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}

You can also keep around a reference to the created table:

  hbase> t1 = create 't1', 'f1'

Which gives you a reference to the table named 't1', on which you can then
call methods.


hbase(main):002:0> 

我们可以用如下命令创建名叫“lol”的表,并且列簇名分别为“name”和“technique”:

hbase(main):007:0> create 'lol',{NAME=>'name'},{NAME=>'technique'}
0 row(s) in 1.4040 seconds

=> Hbase::Table - lol
hbase(main):008:0> 
  • 删除:

删除表需要先disable然后drop

hbase(main):003:0> disable 'lol'
0 row(s) in 3.2680 seconds

hbase(main):004:0> drop 'lol'
0 row(s) in 1.2960 seconds

hbase(main):005:0> 
  • 修改表名:

创建完表后,需要先创建临时快照,然后clone临时快照并重新命名新的表,然后删除临时快照,最后查看新复制的表的属性:

hbase(main):005:0> create 'lol',{NAME=>'name'},{NAME=>'technique'}
0 row(s) in 1.2550 seconds

=> Hbase::Table - lol
hbase(main):006:0> snapshot 'lol','lol_tmp'
0 row(s) in 0.3550 seconds

hbase(main):007:0> clone_snapshot 'lol_tmp','leagueoflengend'
0 row(s) in 0.5940 seconds

hbase(main):008:0> delete
delete                delete_all_snapshot   delete_snapshot
deleteall
hbase(main):008:0> delete_snapshot 'lol_tmp'
0 row(s) in 0.0640 seconds

hbase(main):010:0> desc 'leagueoflengend'
Table leagueoflengend is ENABLED                                                
leagueoflengend                                                                 
COLUMN FAMILIES DESCRIPTION                                                     
{NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KE
EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', CO
MPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65
536', REPLICATION_SCOPE => '0'}                                                 
{NAME => 'technique', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false
', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER
', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =
> '65536', REPLICATION_SCOPE => '0'}                                            
2 row(s) in 0.0700 seconds

hbase(main):011:0> 
  • 查看所有表
list
hbase(main):012:0> list
TABLE                                                                           
leagueoflengend                                                                 
lol                                                                             
2 row(s) in 0.0170 seconds

=> ["leagueoflengend", "lol"]
hbase(main):013:0> 



行的增删改查

我们对表’lol’插入“机械公敌-兰博”的数据,可以看到行键为“3”的行存储了兰博的信息(名字为Rambo,称号为Mechanical Enemy,Q技能为grilled at high temperature):

hbase(main):064:0> put 'lol','3','name:name','Rambo'
0 row(s) in 0.0120 seconds

hbase(main):066:0> put 'lol','3','name:title','Mechanical Enemy'
0 row(s) in 0.0260 seconds

hbase(main):085:0> put 'lol','3','tech:q','grilled at high temperature'
0 row(s) in 0.0410 seconds

hbase(main):067:0> scan 'lol'
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
 1                                             column=tech:q, timestamp=1652061296629, value=Staggered jade cut                                                                        
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
 3                                             column=name:title, timestamp=1652059808472, value=Mechanical Enemy                                                                      
 3                                             column=tech:q, timestamp=1652061445488, value=grilled at high temperature                                                               
3 row(s) in 0.0510 seconds


hbase(main):068:0>

deleteall:删除整行所有数据:

hbase(main):198:0> deleteall 'lol','2'
0 row(s) in 0.0060 seconds

hbase(main):199:0>

delete:删除对应行对应列对应时间戳的数据:

hbase(main):199:0> scan 'lol'
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:cn-name, timestamp=1652065791331, value=\xE6\xB0\xB8\xE6\x81\xA9                                                            
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
 1                                             column=tech:q, timestamp=1652061296629, value=Staggered jade cut                                                                        
 1                                             column=tech:w, timestamp=1652065118775, value=Spirit Cleave                                                                             
 2                                             column=name:cn-name, timestamp=1652065856945, value=\xE4\xBA\x9A\xE7\xB4\xA2                                                            
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
 3                                             column=name:cn-name, timestamp=1652065881601, value=\xE5\x85\xB0\xE5\x8D\x9A                                                            
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
 3                                             column=name:title, timestamp=1652059808472, value=Mechanical Enemy                                                                      
 3                                             column=tech:q, timestamp=1652061445488, value=grilled at high temperature                                                               
3 row(s) in 0.0410 seconds

hbase(main):200:0> delete 'lol','1','name:tech',1652059717967
0 row(s) in 0.0090 seconds

hbase(main):201:0>

可以直接再put就可以更改了:

put 'lol','3','name:name','Rambo_changed'



列簇的增删改

增加列簇:

hbase(main):079:0> alter 'lol',NAME=>'tech'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9890 seconds

hbase(main):080:0> desc 'lol'
Table lol is ENABLED                                                                                                                                                                   
lol                                                                                                                                                                                    
COLUMN FAMILIES DESCRIPTION                                                                                                                                                            
{NAME => 'country', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE',
 MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                            
{NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                               
{NAME => 'tech', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                               
3 row(s) in 0.0170 seconds

hbase(main):081:0>

删除列簇:

hbase(main):088:0> alter 'lol',NAME=>'country',METHOD=>'delete'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.2430 seconds

hbase(main):089:0> desc 'lol'
Table lol is ENABLED                                                                                                                                                                   
lol                                                                                                                                                                                    
COLUMN FAMILIES DESCRIPTION                                                                                                                                                            
{NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                               
{NAME => 'tech', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                               
2 row(s) in 0.0170 seconds

hbase(main):090:0>

先增加列簇,再删除列簇即可



查询

hbase的匹配有两种,一种是模糊匹配,一种是精确匹配,类似于elasticsearch里的match和term:

substring:模糊匹配,如(我之前已在lol表的name列簇中添加了tech列):

hbase(main):148:0> scan 'lol',FILTER=>"QualifierFilter (=,'substring:te')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
2 row(s) in 0.0220 seconds

hbase(main):149:0> scan 'lol',FILTER=>"QualifierFilter (=,'substring:tech')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
2 row(s) in 0.0070 seconds

hbase(main):150:0>

binary:精确匹配,如:

hbase(main):146:0> scan 'lol',FILTER=>"QualifierFilter (=,'binary:te')"
ROW                                            COLUMN+CELL                                                                                                                             
0 row(s) in 0.0280 seconds

hbase(main):147:0> scan 'lol',FILTER=>"QualifierFilter (=,'binary:tech')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
2 row(s) in 0.0160 seconds

hbase(main):148:0>
  • get

get用来查询某行的数据。官方help:

Here is some help for this command:
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:

  hbase> get 'ns1:t1', 'r1'
  hbase> get 't1', 'r1'
  hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> get 't1', 'r1', {COLUMN => 'c1'}
  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> get 't1', 'r1', 'c1'
  hbase> get 't1', 'r1', 'c1', 'c2'
  hbase> get 't1', 'r1', ['c1', 'c2']
  hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
  hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
  hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'}
  hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}

Besides the default 'toStringBinary' format, 'get' also supports custom formatting by
column.  A user can define a FORMATTER by adding it to the column name in the get
specification.  The FORMATTER can be stipulated: 

 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.

Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: 
  hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt',
    'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } 

Note that you can specify a FORMATTER by column only (cf:qualifier).  You cannot specify
a FORMATTER for all columns of a column family.
    
The same commands also can be run on a reference to a table (obtained via get_table or
create_table). Suppose you had a reference t to table 't1', the corresponding commands
would be:

  hbase> t.get 'r1'
  hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> t.get 'r1', {COLUMN => 'c1'}
  hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> t.get 'r1', 'c1'
  hbase> t.get 'r1', 'c1', 'c2'
  hbase> t.get 'r1', ['c1', 'c2']
  hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE'}
  hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}

以下分别为查询rowkey为“1”,列簇为“tech”的数据和同时查询列簇为“name”或“tech”的数据:

hbase(main):092:0> get 'lol','1',{COLUMN=>'tech'}
COLUMN                                         CELL                                                                                                                                    
 tech:q                                        timestamp=1652061296629, value=Staggered jade cut                                                                                       
1 row(s) in 0.0080 seconds

hbase(main):093:0> get 'lol','1',{COLUMN=>['name','tech']}
COLUMN                                         CELL                                                                                                                                    
 name:fname                                    timestamp=1652003547121, value=Yone                                                                                                     
 name:tech                                     timestamp=1652059717967, value=Staggered jade cut                                                                                       
 name:title                                    timestamp=1652003832088, value=Demon Sword Soul                                                                                         
 tech:q                                        timestamp=1652061296629, value=Staggered jade cut                                                                                       
4 row(s) in 0.0080 seconds

hbase(main):094:0>

如果要查询某个值,比如查询名称为“Yone”的数据:

hbase(main):094:0> get 'lol','1',{FILTER=>"ValueFilter (=,'binary:Yone')"}
COLUMN                                         CELL                                                                                                                                    
 name:fname                                    timestamp=1652003547121, value=Yone                                                                                                     
1 row(s) in 0.0090 seconds

hbase(main):095:0>
  • scan

官方的help总是很全面,基本上看这个就够了

Here is some help for this command:
Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, ROWPREFIXFILTER, TIMESTAMP,
MAXLENGTH or COLUMNS, CACHE or RAW, VERSIONS, ALL_METRICS or METRICS

If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family'.

The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the
Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter.

If you wish to see metrics regarding the execution of the scan, the
ALL_METRICS boolean should be set to true. Alternatively, if you would
prefer to see only a subset of the metrics, the METRICS array can be 
defined to include the names of only the metrics you care about.

Some examples:

  hbase> scan 'hbase:meta'
  hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
  hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
  hbase> scan 't1', {REVERSED => true}
  hbase> scan 't1', {ALL_METRICS => true}
  hbase> scan 't1', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']}
  hbase> scan 't1', {ROWPREFIXFILTER => 'row2', FILTER => "
    (QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"}
  hbase> scan 't1', {FILTER =>
    org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
  hbase> scan 't1', {CONSISTENCY => 'TIMELINE'}
For setting the Operation Attributes 
  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default.  Example:

  hbase> scan 't1', {RAW => true, VERSIONS => 10}

Besides the default 'toStringBinary' format, 'scan' supports custom formatting
by column.  A user can define a FORMATTER by adding it to the column name in
the scan specification.  The FORMATTER can be stipulated: 

 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.

Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: 
  hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',
    'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } 

Note that you can specify a FORMATTER by column only (cf:qualifier).  You cannot
specify a FORMATTER for all columns of a column family.

Scan can also be used directly from a table, by first getting a reference to a
table, like such:

  hbase> t = get_table 't'
  hbase> t.scan

Note in the above situation, you can still provide all the filtering, columns,
options, etc as described above.

比如我们要查看表”lol”的前两行数据:

若不加限制条件则直接查看表所有数据

hbase(main):068:0> scan 'lol',{LIMIT=>2}
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
2 row(s) in 0.0330 seconds

hbase(main):069:0>

如果要对行号进行限定:

注意,区间为[STARTROW, ENDROW)

hbase(main):069:0> scan 'lol',{STARTROW=>'2',ENDROW=>'3'}
ROW                                            COLUMN+CELL                                                                                                                             
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
1 row(s) in 0.0340 seconds

hbase(main):070:0> scan 'lol',{STARTROW=>'2',ENDROW=>'4'}
ROW                                            COLUMN+CELL                                                                                                                             
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
 3                                             column=name:title, timestamp=1652059808472, value=Mechanical Enemy                                                                      
2 row(s) in 0.0090 seconds

hbase(main):071:0>

如果要对时间进行限定:

hbase(main):074:0> scan 'lol',{FILTER=>"(TimestampsFilter (1652003547121,1652059643106))"}
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
2 row(s) in 0.0340 seconds

hbase(main):075:0>



高级查询

  • 查看hbase支持的filter:
hbase(main):047:0> show_filters
DependentColumnFilter                                                           
KeyOnlyFilter                                                                   
ColumnCountGetFilter                                                            
SingleColumnValueFilter                                                         
PrefixFilter                                                                    
SingleColumnValueExcludeFilter                                                  
FirstKeyOnlyFilter                                                              
ColumnRangeFilter                                                               
TimestampsFilter                                                                
FamilyFilter                                                                    
QualifierFilter                                                                 
ColumnPrefixFilter                                                              
RowFilter                                                                       
MultipleColumnPrefixFilter                                                      
InclusiveStopFilter                                                             
PageFilter                                                                      
ValueFilter                                                                     
ColumnPaginationFilter                                                          

hbase(main):048:0> 
  • 行键过滤器

RowFilter:对行键进行过滤。如以下命令获取rowkey开头为“1”的数据

hbase(main):051:0> scan 'lol',FILTER=>"RowFilter(=,'binaryprefix:1')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
1 row(s) in 0.0240 seconds

hbase(main):052:0>

PrefixFilter:行键前缀过滤。上面的命令可以这样写:

hbase(main):056:0> scan 'lol',FILTER=>"PrefixFilter('1')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
1 row(s) in 0.0260 seconds

hbase(main):057:0>

FirstKeyOnlyFilter:显示每个逻辑行的第一个数据,可以用来快速查看表的基本数据,也可以提高统计计数的效率

hbase(main):098:0> scan 'lol',{FILTER=>"FirstKeyOnlyFilter()"}
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
3 row(s) in 0.0130 seconds

hbase(main):099:0>

同时我们可以直接用count来查询行数:

hbase(main):194:0> scan 'lol'
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:cn-name, timestamp=1652065791331, value=\xE6\xB0\xB8\xE6\x81\xA9                                                            
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
 1                                             column=tech:q, timestamp=1652061296629, value=Staggered jade cut                                                                        
 1                                             column=tech:w, timestamp=1652065118775, value=Spirit Cleave                                                                             
 2                                             column=name:cn-name, timestamp=1652065856945, value=\xE4\xBA\x9A\xE7\xB4\xA2                                                            
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
 3                                             column=name:cn-name, timestamp=1652065881601, value=\xE5\x85\xB0\xE5\x8D\x9A                                                            
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
 3                                             column=name:title, timestamp=1652059808472, value=Mechanical Enemy                                                                      
 3                                             column=tech:q, timestamp=1652061445488, value=grilled at high temperature                                                               
 4                                             column=name:name, timestamp=1652085807213, value=Foyego                                                                                 
4 row(s) in 0.0120 seconds

hbase(main):195:0> count 'lol'
4 row(s) in 0.0070 seconds

=> 4
hbase(main):196:0>
  • 列簇和列过滤器

FamilyFilter:查询列簇名。如查找列簇名包含“te”的数据:

hbase(main):101:0> scan 'lol',FILTER=>"FamilyFilter (=,'substring:te')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=tech:q, timestamp=1652061296629, value=Staggered jade cut                                                                        
 3                                             column=tech:q, timestamp=1652061445488, value=grilled at high temperature                                                               
2 row(s) in 0.0270 seconds

hbase(main):102:0>

QualifierFilter:查询列名。如查找包含“tech”的列的数据:

hbase(main):104:0> scan 'lol',FILTER=>"QualifierFilter (=,'substring:tech')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
2 row(s) in 0.0080 seconds

hbase(main):105:0>

ColumnPrefixFilter:查询列前缀为xx。如查找列以“f”开头的数据:

hbase(main):106:0> scan 'lol',FILTER=>"ColumnPrefixFilter('f')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
1 row(s) in 0.0060 seconds

hbase(main):107:0>

MultipleColumnPrefixFilter:查询多个列前缀。如:

hbase(main):107:0> scan 'lol',FILTER=>"MultipleColumnPrefixFilter('na','f')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
3 row(s) in 0.0190 seconds

hbase(main):108:0>

ColumnRangeFilter:设定范围来对列进行过滤,其中true和false来设置起始点和结束点,范围与STARTROW和ENDROW一样是左闭右开:

hbase(main):110:0> scan 'lol'
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
 1                                             column=tech:q, timestamp=1652061296629, value=Staggered jade cut                                                                        
 1                                             column=tech:w, timestamp=1652065118775, value=Spirit Cleave                                                                             
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
 3                                             column=name:title, timestamp=1652059808472, value=Mechanical Enemy                                                                      
 3                                             column=tech:q, timestamp=1652061445488, value=grilled at high temperature                                                               
3 row(s) in 0.0120 seconds

hbase(main):111:0> scan 'lol',FILTER=>"ColumnRangeFilter ('na',true,'te',false)"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=tech:q, timestamp=1652061296629, value=Staggered jade cut                                                                        
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
 3                                             column=tech:q, timestamp=1652061445488, value=grilled at high temperature                                                               
3 row(s) in 0.0530 seconds

hbase(main):112:0> scan 'lol',FILTER=>"ColumnRangeFilter ('na',true,'wa',false)"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
 1                                             column=tech:q, timestamp=1652061296629, value=Staggered jade cut                                                                        
 1                                             column=tech:w, timestamp=1652065118775, value=Spirit Cleave                                                                             
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
 3                                             column=name:title, timestamp=1652059808472, value=Mechanical Enemy                                                                      
 3                                             column=tech:q, timestamp=1652061445488, value=grilled at high temperature                                                               
3 row(s) in 0.0330 seconds

hbase(main):113:0>
  • 值过滤

ValueFilter:查询值。在此之前我对lol表插入了各个英雄的中文名,可以看到hbase默认会将中文转化成以16进制存储并展示:

hbase(main):117:0> scan 'lol'
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:cn-name, timestamp=1652065791331, value=\xE6\xB0\xB8\xE6\x81\xA9                                                            
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 1                                             column=name:tech, timestamp=1652059717967, value=Staggered jade cut                                                                     
 1                                             column=name:title, timestamp=1652003832088, value=Demon Sword Soul                                                                      
 1                                             column=tech:q, timestamp=1652061296629, value=Staggered jade cut                                                                        
 1                                             column=tech:w, timestamp=1652065118775, value=Spirit Cleave                                                                             
 2                                             column=name:cn-name, timestamp=1652065856945, value=\xE4\xBA\x9A\xE7\xB4\xA2                                                            
 2                                             column=name:name, timestamp=1652059786181, value=Yasuo                                                                                  
 2                                             column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash                                                                   
 2                                             column=name:title, timestamp=1652059364721, value=Wind Swordsman                                                                        
 3                                             column=name:cn-name, timestamp=1652065881601, value=\xE5\x85\xB0\xE5\x8D\x9A                                                            
 3                                             column=name:name, timestamp=1652059766646, value=Rambo                                                                                  
 3                                             column=name:title, timestamp=1652059808472, value=Mechanical Enemy                                                                      
 3                                             column=tech:q, timestamp=1652061445488, value=grilled at high temperature                                                               
3 row(s) in 0.0230 seconds

hbase(main):118:0> scan 'lol',FILTER=>"ValueFilter (=,'substring:永恩')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:cn-name, timestamp=1652065791331, value=\xE6\xB0\xB8\xE6\x81\xA9                                                            
1 row(s) in 0.0290 seconds

hbase(main):145:0> scan 'lol',FILTER=>"ValueFilter (=,'substring:ne')"
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:fname, timestamp=1652003547121, value=Yone                                                                                  
 3                                             column=name:title, timestamp=1652059808472, value=Mechanical Enemy                                                                      
2 row(s) in 0.0300 seconds


hbase(main):119:0>

Tips:hbase显示中文:

hbase(main):144:0> scan 'lol',{COLUMNS => 'name:cn-name:toString'}
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=name:cn-name, timestamp=1652065791331, value=永恩                                                                              
 2                                             column=name:cn-name, timestamp=1652065856945, value=亚索                                                                              
 3                                             column=name:cn-name, timestamp=1652065881601, value=兰博                                                                              
3 row(s) in 0.0210 seconds

hbase(main):145:0>



结语

本文介绍的只是非常基础的语法,hbase还有很多用法没有展示,比如导入数据等,期待进一步学习。



感谢


HBase基本语法


HBase入门:查询中常用的Filter总结



版权声明:本文为weixin_46003360原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。