博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
HBase 压缩算法设置及修改
阅读量:7247 次
发布时间:2019-06-29

本文共 2816 字,大约阅读时间需要 9 分钟。

Compression就是在用CPU换IO吞吐量/磁盘空间,如果没有什么特殊原因推荐针对Column Family设置compression,下面主要有三种算法: GZIP, LZO, Snappy,作者推荐使用Snappy,因为它有较好的Encoding/Decoding速度和可以接受的压缩率。

HBase comes with support for a number of compression algorithims that can be enabled at the column family level. Enabling compression is recommended unless you have a reason not to do so, for example, when using already compressed content, such as JPEG images. For every other use-case compression usually will yield an overall better performance, because the overhead of the CPU performing the compression and decompression is less than what is required to read more data from disk.

Available Codecs

You can choose from a fixed list of supported compression algorithms. They have different qualities when it comes to compression ratio, as well as CPU and installation requirements.

Table 11.1. Comparison between compression algorithms

Algorithm % remaining Encoding Decoding
GZIP 13.4% 21 MB/s 118 MB/s
LZO 20.5% 135 MB/s 410 MB/s
Zippy/Snappy 22.2% 172 MB/s 409 MB/s


Note that some of the algorithms have a better compression ration while others are faster for the encoding, and a lot faster during decoding. Depending on your use-case you can choose one that suits you best.

Enabling Compression

Enabling compression requires the installation of the JNI and native compression libraries (unless you only want to use the Java code based GZIP compression), as described above, and specifying the chosen algorithm in the column family schema.

One way to accomplish this is during the creation of the table. The possible values are listed in :

  1. hbase(main):001:0> create 'testtable', { NAME => 'colfam1', COMPRESSION => 'GZ' }    
  2. 0 row(s) in 1.1920 seconds  
  3.   
  4. hbase(main):012:0> describe 'testtable'                                              
  5. DESCRIPTION                                                 ENABLED  
  6. {
    NAME => 'testtable', FAMILIES => [{
    NAME => 'colfam1',      true   
  7. BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS   
  8. => '3', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE  
  9. => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}  
  10. 1 row(s) in 0.0400 seconds  

The describe shell command is used to read back the schema of the newly created table. You can see the compression is set to GZIP (using the shorter "GZ" value as required). Another option to enable - or change, or disable - the compression algorithm is using the alter command for existing tables:

  1. hbase(main):013:0> create 'testtable2', 'colfam1'  
  2. 0 row(s) in 1.1920 seconds  
  3.   
  4. hbase(main):014:0> disable 'testtable2'  
  5. 0 row(s) in 2.0650 seconds  
  6.   
  7. hbase(main):016:0> alter 'testtable2', { NAME => 'colfam1', COMPRESSION => 'GZ' }  
  8. 0 row(s) in 0.2190 seconds  
  9.   
  10. hbase(main):017:0> enable 'testtable2'  
  11. 0 row(s) in 2.0410 seconds  

Note how the table was first disabled. This is necessary to perform the alteration of the column family definition. The final enable command brings the table back online.

转载地址:http://hwnbm.baihongyu.com/

你可能感兴趣的文章
(二十三)变量名的命名
查看>>
如何保证摘除公网EIP的容器服务VPC集群可以正常访问公网
查看>>
linux进程状态浅析
查看>>
【JavaScript】DOM节点常用方法介绍02
查看>>
异步操作系列之Generator函数与Async函数
查看>>
水平无限循环弹幕的实现
查看>>
老前端出坑小程序(一)
查看>>
别躲了,机器知道你们的关系
查看>>
C# 通过反射创建实例
查看>>
UML 类图
查看>>
人工智能即将取代人类?
查看>>
关于常用的http请求头以及响应头详解
查看>>
HTML解析过程会触发哪些事件?
查看>>
技术变现,到底怎么变?这里有几个小众的“金点子”
查看>>
AbstractQueuedSynchronizer 队列同步器(AQS)
查看>>
构建可读性更高的 ASP.NET Core 路由
查看>>
#学习笔记-sql# union实例及用法
查看>>
html-webpack-plugin
查看>>
Promise源码实现2
查看>>
警告WIN10用户!Disk Cleanup可能会误删下载活页夹
查看>>