是否有快速，准确的Lucene荧光笔？-Isthereafast,accurateHighlighterforLucene?

作者：Ray依依 | 来源：互联网 | 2023-05-19 17:37

Ivebeenusingthe(Java)HighlighterforLucene(intheSandboxpackage)forsometime.However,t

I've been using the (Java) Highlighter for Lucene (in the Sandbox package) for some time. However, this isn't really very accurate when it comes to matching the correct terms in search results - it works well for simple queries, for example searching for two separate words will highlight both code fragments in the results.

我已经使用(Java)Highlighter for Lucene(在Sandbox包中)一段时间了。但是,在匹配搜索结果中的正确术语时,这并不是非常准确 - 它适用于简单查询,例如,搜索两个单独的单词将突出显示结果中的两个代码片段。

However, it doesn't act well with more complicated queries. In the simplest case, phrase queries such as "Stack Overflow" will match all occurrences of Stack or Overflow in the highlighting, which gives the impression to the user that it isn't working very well.

但是,对于更复杂的查询,它不能很好地运行。在最简单的情况下,诸如“Stack Overflow”之类的短语查询将匹配突出显示中所有出现的Stack或Overflow,这给用户留下了不能很好地工作的印象。

I tried applying the fix here but that came with a lot of performance caveats, and at the end of the day was just plain unusable. The performance is especially an issue on wildcard queries. This is due to the way that the highlighting works; instead of just working on the querystring and the text it parses it as Lucene would and then looks for all the matches that Lucene has made; unfortunately this means that for certain wildcard queries it can be looking for matches to 2000+ clauses on large documents, and it's simply not fast enough.

我尝试在这里应用修复程序,但这带来了很多性能警告,并且在一天结束时只是普遍无法使用。性能尤其是通配符查询的问题。这是由于突出显示的工作方式;它不是仅仅处理查询字符串和文本,而是像Lucene那样解析它,然后查找Lucene所做的所有匹配;不幸的是,这意味着对于某些通配符查询,它可以在大型文档上查找与2000+子句的匹配,并且它的速度不够快。

Is there any faster implementation of an accurate highlighter?

有没有更快的实现准确的荧光笔?

3 个解决方案

#1

There is a new faster highlighter (needs to be patched in but will be part of release 2.9)

有一个新的更快的荧光笔(需要修补但将成为2.9版的一部分)

https://issues.apache.org/jira/browse/LUCENE-1522

and a back-reference to this question

并且对这个问题的反向引用

#2

You could look into using Solr. http://lucene.apache.org/solr

你可以考虑使用Solr。 http://lucene.apache.org/solr

Solr is a sort of generic search application that uses Lucene and supports highlighting. It's possible that the highlighting in Solr is usable as an API outside of Solr. You could also look at how Solr does it for inspiration.

Solr是一种使用Lucene并支持突出显示的通用搜索应用程序。 Solr中的突出显示可能可用作Solr之外的API。您还可以看看Solr如何为灵感做到这一点。

#3

I've been reading on the subject and came across spanQuery which would return to you the span of the matched term or terms in the field that matched.

我一直在阅读这个主题,并遇到了spanQuery,它将返回匹配字段中匹配的术语或术语的范围。

推荐阅读

spring
javax.mail.search.BodyTerm.matchPart()方法的使用及代码示例

javax.mail.search.BodyTerm.matchPart()方法的使用及代码示例 ... [详细]

蜡笔小新 2024-11-13 15:24:50
byte
HDFS API

Hadoop的文件操作位于包org.apache.hadoop.fs里面，能够进行新建、删除、修改等操作。比较重要的几个类：(1)Configurati ... [详细]

蜡笔小新 2024-11-13 17:31:50
tree
com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例

com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例 ... [详细]

蜡笔小新 2024-11-13 10:47:33
filter
如何使用 `org.apache.tomcat.websocket.server.WsServerContainer.findMapping()` 方法及其代码示例解析

如何使用 `org.apache.tomcat.websocket.server.WsServerContainer.findMapping()` 方法及其代码示例解析 ... [详细]

蜡笔小新 2024-11-11 10:08:55
filter
Python基础：使用NLTK和Python构建机器学习应用

本文节选自《NLTK基础教程——用NLTK和Python库构建机器学习应用》一书的第1章第1.2节，作者Nitin Hardeniya。本文将带领读者快速了解Python的基础知识，为后续的机器学习应用打下坚实的基础。 ... [详细]

蜡笔小新 2024-11-13 21:23:34
instance
Java反射机制详解及应用场景

本文详细介绍了Java反射机制的基本概念、获取Class对象的方法、反射的主要功能及其在实际开发中的应用。通过具体示例，帮助读者更好地理解和使用Java反射。 ... [详细]

蜡笔小新 2024-11-13 16:08:08
instance
Spring – Bean Life Cycle

Spring – Bean Life Cycle ... [详细]

蜡笔小新 2024-11-13 13:24:40
instance
Java 编程错误：对象无法转换为 long 类型

本文介绍了在 Java 编程中遇到的一个常见错误：对象无法转换为 long 类型，并提供了详细的解决方案。 ... [详细]

蜡笔小新 2024-11-13 10:57:24
instance
Spring详解（六）AOP

原文网址：https:www.cnblogs.comysoceanp7476379.html目录1、AOP什么？2、需求3、解决办法1:使用静态代理4 ... [详细]

蜡笔小新 2024-11-12 14:40:40
instance
实验九：使用SharedPreferences存储简单数据

本实验旨在帮助学生理解和掌握使用SharedPreferences存储和读取简单数据的方法，包括程序参数和用户选项。 ... [详细]

蜡笔小新 2024-11-12 14:21:47
buffer
字节流(InputStream和OutputStream)，字节流读写文件，字节流的缓冲区，字节缓冲流

字节流抽象类InputStream和OutputStream是字节流的顶级父类所有的字节输入流都继承自InputStream，所有的输出流都继承子OutputStreamInput ... [详细]

蜡笔小新 2024-11-12 14:07:25
char
在 QQmlPropertyMap 的派生类中无法调用槽函数或 Q_INVOKABLE 方法？

在尝试对 QQmlPropertyMap 类进行测试驱动开发时，发现其派生类中无法正常调用槽函数或 Q_INVOKABLE 方法。这可能是由于 QQmlPropertyMap 的内部实现机制导致的，需要进一步研究以找到解决方案。 ... [详细]

蜡笔小新 2024-11-11 15:34:22
char
如何使用 `org.eclipse.rdf4j.query.impl.MapBindingSet.getValue()` 方法及其代码示例详解

如何使用 `org.eclipse.rdf4j.query.impl.MapBindingSet.getValue()` 方法及其代码示例详解 ... [详细]

蜡笔小新 2024-11-11 02:42:52
install
WordPress Duplicator 0.4.4 版本存在跨站脚本攻击漏洞分析

在对WordPress Duplicator插件0.4.4版本的安全评估中，发现其存在跨站脚本（XSS）攻击漏洞。此漏洞可能被利用进行恶意操作，建议用户及时更新至最新版本以确保系统安全。测试方法仅限于安全研究和教学目的，使用时需自行承担风险。漏洞编号：HTB23162。 ... [详细]

蜡笔小新 2024-11-10 13:16:43
instance
自定义 Android 圆形进度条视图，支持显示数字和中心文字

本文介绍了一种自定义的Android圆形进度条视图，支持在进度条上显示数字，并在圆心位置展示文字内容。通过自定义绘图和组件组合的方式实现，详细展示了自定义View的开发流程和关键技术点。示例代码和效果展示将在文章末尾提供。 ... [详细]

蜡笔小新 2024-11-10 13:04:42

Ray依依

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章