我有1000万个lucene文件,看起来像这样:
{ "0": 230, "1": 12, "2": 611, "3": 800 }
我试图找到所有文件,所有字段都小于10.这是我有的lucene代码:
BooleanQuery bq = new BooleanQuery(); bq.Add(NumericRangeQuery.NewIntRange("0", 1, 10, true, true), Occur.MUST); bq.Add(NumericRangeQuery.NewIntRange("1", 1, 10 , true, true), Occur.MUST); bq.Add(NumericRangeQuery.NewIntRange("2", 1, 10, true, true), Occur.MUST); //bq.Add(NumericRangeQuery.NewIntRange("3", 1, 1000, true, true), Occur.MUST); TopDocs hits = searcher.Search(bq, 10); int counter = 0; foreach (ScoreDoc scoreDoc in hits.ScoreDocs) { Lucene.Net.Documents.Document doc = searcher.Doc(scoreDoc.Doc); Console.WriteLine("3: " + doc.Get("3")); counter++; }
我遇到的问题是,当我检查所有4个属性以查看是否所有4个属性都在1到10之间时,我没有得到任何结果.当我检查前3个属性时,我得到了正确的结果.但是,当我添加第四个时,我什么也得不到.正如您所看到的那样,第四个布尔子句被注释掉了,因为它不会产生任何结果.我甚至在1到1000之间的整个范围内进行了第四次财产检查,但我仍然没有结果.难道我做错了什么?以下是我构建索引的方法.
public static void BuildIndex() { Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo("C:\\Users\\Luke\\Desktop\\1")); Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30); IndexWriter writer = new IndexWriter(directory, analyzer, new IndexWriter.MaxFieldLength(100000)); for (int x = 0; x <10000000; x++) { Document doc = new Document(); doc.Add(new NumericField("id", 100000, Field.Store.YES, true).SetIntValue(x)); for (int i = 0; i <5; i++) { doc.Add(new NumericField(i.ToString(), 100000, Field.Store.YES, true).SetIntValue(rand.Next(1, 1000))); } writer.AddDocument(doc); if (x % 500 == 0) { Console.WriteLine(x); } } writer.Optimize(); writer.Flush(true, true, true); writer.Dispose(); directory.Dispose(); Console.WriteLine("done"); Console.Read(); }
Rushik.. 5
我刚刚在Java Lucene(4.4)中重新创建了这个程序,我在数值范围查询中没有看到任何问题.
1)3份文件
field:0 - value:137 field:1 - value:41 field:2 - value:908 field:3 - value:871 field:4 - value:686 field:0 - value:598 field:1 - value:623 field:2 - value:527 field:3 - value:364 field:4 - value:800 field:0 - value:96 field:1 - value:301 field:2 - value:323 field:3 - value:94 field:4 - value:653
2)索引器
package com.numericrange; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.IntField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class IndexBuilder { /** * @param args * @throws IOException */ public static void main(String[] args) throws IOException { Directory dir = FSDirectory.open(new File("/Users/Lucene/indexes")); IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44, new StandardAnalyzer(Version.LUCENE_44)); iwc.setOpenMode(OpenMode.CREATE); IndexWriter writer = new IndexWriter(dir, iwc); for (int x = 0; x <3; x++) { Document doc = new Document(); IntField iFldOut = new IntField("id", 6, Field.Store.YES); iFldOut.setIntValue(x); doc.add(iFldOut); for (int i = 0; i <5; i++) { int randomVal = (int)(Math.random() * 1000) + 1; IntField iFld = new IntField(Integer.toString(i), 6, Field.Store.YES); iFld.setIntValue(randomVal); doc.add(iFld); System.out.println("i:" + i + " - Random Value:" + randomVal); } writer.addDocument(doc); } int newNumDocs = writer.numDocs(); System.out.println("************************"); System.out.println(newNumDocs + " documents added."); System.out.println("************************"); writer.close(); } }
3)搜索
package com.numericrange; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopScoreDocCollector; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class NumericQueryDemo { public static void main(String[] args) throws IOException, Exception { // Use Indexes from existing folder String dirPath = "/Users/Lucene/indexes"; IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(dirPath))); IndexSearcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_44); BooleanQuery bq = new BooleanQuery(); bq.add(NumericRangeQuery.newIntRange("0", 100, 600, true, true), Occur.MUST); bq.add(NumericRangeQuery.newIntRange("1", 40, 700, true, true), Occur.MUST); bq.add(NumericRangeQuery.newIntRange("2", 500, 1000, true, true), Occur.MUST); bq.add(NumericRangeQuery.newIntRange("3", 300, 900, true, true), Occur.MUST); bq.add(NumericRangeQuery.newIntRange("4", 600, 800, true, true), Occur.MUST); System.out.println("Query Data:" + bq.toString()); TopScoreDocCollector collector = TopScoreDocCollector.create(500, true); long startTime = System.currentTimeMillis(); searcher.search(bq, collector); System.out.println("Search Time: "+(System.currentTimeMillis() - startTime)+"ms"); // Display Results ScoreDoc[] hits = collector.topDocs().scoreDocs; System.out.println("Found " + hits.length + " hits."); for(int i=0; i4)搜索结果
Query Data:+0:[100 TO 600] +1:[40 TO 700] +2:[500 TO 1000] +3:[300 TO 900] +4:[600 TO 800] Search Time: 27ms Found 2 hits. 1. 2.236068 0 ==== 137 ==== 41 ==== 908 ==== 871 ==== 686 2. 2.236068 1 ==== 598 ==== 623 ==== 527 ==== 364 ==== 800如您所见,我使用的是precisionStep值为'6'.我验证了文件通过Luke正确编入索引,并通过Luke解雇了相同的查询.
你能尝试通过Luke界面触发查询吗?根据您的文档更改值.
+0:[100至600] +1:[40至700] +2:[500至1000] +3:[300至900] +4:[600至800]
1> Rushik..:我刚刚在Java Lucene(4.4)中重新创建了这个程序,我在数值范围查询中没有看到任何问题.
1)3份文件
field:0 - value:137 field:1 - value:41 field:2 - value:908 field:3 - value:871 field:4 - value:686 field:0 - value:598 field:1 - value:623 field:2 - value:527 field:3 - value:364 field:4 - value:800 field:0 - value:96 field:1 - value:301 field:2 - value:323 field:3 - value:94 field:4 - value:6532)索引器
package com.numericrange; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.IntField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class IndexBuilder { /** * @param args * @throws IOException */ public static void main(String[] args) throws IOException { Directory dir = FSDirectory.open(new File("/Users/Lucene/indexes")); IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44, new StandardAnalyzer(Version.LUCENE_44)); iwc.setOpenMode(OpenMode.CREATE); IndexWriter writer = new IndexWriter(dir, iwc); for (int x = 0; x <3; x++) { Document doc = new Document(); IntField iFldOut = new IntField("id", 6, Field.Store.YES); iFldOut.setIntValue(x); doc.add(iFldOut); for (int i = 0; i <5; i++) { int randomVal = (int)(Math.random() * 1000) + 1; IntField iFld = new IntField(Integer.toString(i), 6, Field.Store.YES); iFld.setIntValue(randomVal); doc.add(iFld); System.out.println("i:" + i + " - Random Value:" + randomVal); } writer.addDocument(doc); } int newNumDocs = writer.numDocs(); System.out.println("************************"); System.out.println(newNumDocs + " documents added."); System.out.println("************************"); writer.close(); } }3)搜索
package com.numericrange; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopScoreDocCollector; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class NumericQueryDemo { public static void main(String[] args) throws IOException, Exception { // Use Indexes from existing folder String dirPath = "/Users/Lucene/indexes"; IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(dirPath))); IndexSearcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_44); BooleanQuery bq = new BooleanQuery(); bq.add(NumericRangeQuery.newIntRange("0", 100, 600, true, true), Occur.MUST); bq.add(NumericRangeQuery.newIntRange("1", 40, 700, true, true), Occur.MUST); bq.add(NumericRangeQuery.newIntRange("2", 500, 1000, true, true), Occur.MUST); bq.add(NumericRangeQuery.newIntRange("3", 300, 900, true, true), Occur.MUST); bq.add(NumericRangeQuery.newIntRange("4", 600, 800, true, true), Occur.MUST); System.out.println("Query Data:" + bq.toString()); TopScoreDocCollector collector = TopScoreDocCollector.create(500, true); long startTime = System.currentTimeMillis(); searcher.search(bq, collector); System.out.println("Search Time: "+(System.currentTimeMillis() - startTime)+"ms"); // Display Results ScoreDoc[] hits = collector.topDocs().scoreDocs; System.out.println("Found " + hits.length + " hits."); for(int i=0; i4)搜索结果
Query Data:+0:[100 TO 600] +1:[40 TO 700] +2:[500 TO 1000] +3:[300 TO 900] +4:[600 TO 800] Search Time: 27ms Found 2 hits. 1. 2.236068 0 ==== 137 ==== 41 ==== 908 ==== 871 ==== 686 2. 2.236068 1 ==== 598 ==== 623 ==== 527 ==== 364 ==== 800如您所见,我使用的是precisionStep值为'6'.我验证了文件通过Luke正确编入索引,并通过Luke解雇了相同的查询.
你能尝试通过Luke界面触发查询吗?根据您的文档更改值.
+0:[100至600] +1:[40至700] +2:[500至1000] +3:[300至900] +4:[600至800]