官网68. Client Request Filters介绍Filter
本文基于版本hbase-1.1.2 一、首先介绍几个知识点
1.1、 过滤器是作用于Scan or Get
, 通过设置Filter,对查询进行优化
Get and Scan instances can be optionally configured with filters which are applied on the RegionServer.
Filters can be confusing because there are many different types,
and it is best to approach them by understanding the groups of Filter functionality.
1.2、FilterList 过滤器集合, 可以设置多个Filter, 通过FilterList
下面参数设置Filter生效策略
- FilterList.Operator.MUST_PASS_ONE 表示or的关系, 有一个Filter符合就可以
- FilterList.Operator.MUST_PASS_ALL 便是and关系, 都匹配上。
1.3、org.apache.hadoop.hbase.filter.CompareFilter
, Filter的高层抽闲类, 提供两点
- operator (equal, greater, not equal, etc) , 定义在
enum CompareOp
中
/** Comparison operators. */@InterfaceAudience.Public@InterfaceStability.Stablepublic enum CompareOp {/** less than */LESS,/** less than or equal to */LESS_OR_EQUAL,/** equals */EQUAL,/** not equal */NOT_EQUAL,/** greater than or equal to */GREATER_OR_EQUAL,/** greater than */GREATER,/** no operation */NO_OP,}
- comparator , 它的类型是ByteArrayComparable一个抽象类, 通过
ctrl+t
可以看到它的子类。
##上面连个参数是所有Filter都需要的。
##二、过滤器可以根据过滤类型进行分组
###2.1、过滤Column Value
###2.1.1、SingleColumnValueFilter
SingleColumnValueFilter singleColumnValueFilter= new SingleColumnValueFilter("cf1".getBytes(), //column family"data".getBytes(), //columnCompareOp.EQUAL, new SubstringComparator("223.73.39.213"));/comparator: 匹配子串
###2.1.2、ColumnValueFilter(这个是2.0.0才引入的, 作为SingeColumnValueFilter的补充)
###2.2、 KeyValue Metadata
由于HBASE内部存储数据是按照键值对的,KeyValue Metadata Filters 评估行的keys(i.e., ColumnFamily:Column qualifiers)的存在性,而不是前一节的值。
###2.2.1、FamilyFilter
###2.2.2、QualifierFilter
###2.2.3、ColumnPrefixFilter
###2.2.4、MultipleColumnPrefixFilter
###2.2.5、ColumnRangeFilter
###2.3、根据rowkey过滤
通常使用StROTW/STOPROW方法来扫描行选择是更好的想法,但是也可以使用RowFilter。
###2.3.1、RowFilter
RowFilter rowFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(reg));//两个基本参数
####RowKey的后缀匹配实现: 例如ROWKEY是yyyyMMDD-UserID形式,如果要以UserID为条件查询数据,怎样实现?
- 筛选出某一userId在一个时间段[time1,time2)的值
####解决: 结合startrow, endrow, rowfilter
scan 'tablename' {STARTROW=>'time1+uid', ENDROW=>'time2+uid', FILTER=>"RowFilter(=,'regexstring:.*uid')"}
####代码中实现
Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes(time1+uid));
scan.setStopRow(Bytes.toBytes(time2+uid));
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new RegexStringComparator(".*"+uid));
scan.setFilter(filter);ResultScanner rs = null;
HTable table = new HTable(hbaseConfig, Bytes.toBytes(tableName));
String rowkey = null;
rs = table.getScanner(scan);
for (Result r : rs) {for (KeyValue kv : r.list()) {rowkey = Bytes.toString(kv.getRow());System.out.println(rowkey);}
}
###2.4、Utility
###2.4.1、FirstKeyOnlyFilter
This is primarily used for rowcount jobs.