热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

快速索引solr上的大型数据集-Indexquicklyalargedatasetonsolr

IhavefewmillionsofrecordsandIneedthemtobeindexedinSolr.Oncetheyreindexed,theyre

I have few millions of records and I need them to be indexed in Solr. Once they're indexed, they're not going to be changed and the collections are used only for "read". I am following the pattern by posting the xml docs to the REST api and it works fine ... even though it takes some time (configs are optimized for read and cache);

我有几百万条记录,我需要它们在Solr中编入索引。一旦它们被索引,它们就不会被改变,并且集合仅用于“读取”。我通过将xml文档发布到REST API来遵循该模式,并且它工作正常......即使它需要一些时间(配置针对读取和缓存进行了优化);

But I was wondering ... is there a better/faster approach - maybe avoiding the HTTP/network layer? Something like working locally to build the collection, copy it to solr server and then add/swap the collection?

但我想知道......是否有更好/更快的方法 - 可能避免HTTP /网络层?在本地工作以构建集合,将其复制到solr服务器然后添加/交换集合?

One choice could be a custom DIH for a second/backup core and swap when done - but this would mean I would have to "eat" the memory used on solr for caching slowing down searches.

一个选择可能是第二个/备份核心的自定义DIH和完成时交换 - 但这意味着我必须“吃掉”solr上用于缓存的内存减慢搜索速度。

I am searching/hoping for a disconnected solution - like a command line tool, running on a different machine with the configuration optimized for writing, then copy the core on production swapping the old with the new one.

我正在寻找/希望找到一个断开连接的解决方案 - 比如一个命令行工具,在不同的机器上运行,并且配置已针对写入进行了优化,然后将生产中的核心复制到新的生产中。

Any ideas?

有任何想法吗?

1 个解决方案

#1


1  

Few million records should not be an issue.

几百万条记录不应成为问题。

Check how often you do commit and maybe disable soft commit or make it much higher.

检查您提交的频率,并可能禁用软提交或使其更高。

You can also send documents to one Solr instance from multiple clients and get some multi-threading benefits.

您还可以从多个客户端向一个Solr实例发送文档,并获得一些多线程优势。

And you can certainly write a small SolrJ client to index into a local/embedded core and then swap that core into production.

您当然可以编写一个小型SolrJ客户端来索引本地/嵌入式核心,然后将该核心交换到生产中。


推荐阅读
author-avatar
铁骑侠客_685
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有