作者:手机用户2502891655 | 来源:互联网 | 2023-09-11 10:50
我想在pylucene中编写一个自定义分析器.通常在javalucene中,当你编写一个分析器类时,你的类继承了lucene的Analyzer类.但是pylucene
我想在pylucene中编写一个自定义分析器.
通常在java lucene中,当你编写一个分析器类时,你的类继承了lucene的Analyzer类.
但是pylucene使用jcc,java到c / python编译器.
那么如何让python类使用jcc从java类继承,尤其是如何编写自定义pylucene分析器?
谢谢.
解决方法:
以下是包装EdgeNGram过滤器的分析器示例.
import lucene
class EdgeNGramAnalyzer(lucene.PythonAnalyzer):
'''
This is an example of a custom Analyzer (in this case an edge-n-gram analyzer)
EdgeNGram Analyzers are good for type-ahead
'''
def __init__(self, side, minlength, maxlength):
'''
Args:
side[enum] Can be one of lucene.EdgeNGramTokenFilter.Side.FRONT or lucene.EdgeNGramTokenFilter.Side.BACK
minlength[int]
maxlength[int]
'''
lucene.PythonAnalyzer.__init__(self)
self.side = side
self.minlength = minlength
self.maxlength = maxlength
def tokenStream(self, fieldName, reader):
result = lucene.LowerCaseTokenizer(Version.LUCENE_CURRENT, reader)
result = lucene.StandardFilter(result)
result = lucene.StopFilter(True, result, StopAnalyzer.ENGLISH_STOP_WORDS_SET)
result = lucene.ASCIIFoldingFilter(result)
result = lucene.EdgeNGramTokenFilter(result, self.side, self.minlength, self.maxlength)
return result