弹性搜索查询返回所有记录。-Elasticsearchquerytoreturnallrecords

作者：技术交流 | 来源：互联网 | 2023-05-19 09:17

IhaveasmalldatabaseinElasticsearchandfortestingpurposeswouldliketopullallrecordsbac

I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...

我有一个小的数据库在弹性搜索和测试的目的是想把所有的记录拉回来。我正在尝试使用表单的URL…

http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

Can someone give me the URL you would use to accomplish this, please?

有人能给我你要用的URL来完成这个吗?

16 个解决方案

#1

507

I think lucene syntax is supported so:

我认为lucene语法是支持的:

http://localhost:9200/foo/_search?pretty=true&q=*:*

http://localhost:9200 / foo / _search ?漂亮=真的,q = *:*

size defaults to 10, so you may also need &size=BIGNUMBER to get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)

大小默认为10，因此您可能还需要&size=BIGNUMBER来获得10个以上的项。(BIGNUMBER等于一个你认为比你的数据集大的数字)

BUT, elasticsearch documentation suggests for large result sets, using the scan search type.

但是，使用扫描搜索类型，弹性搜索文档建议使用大型结果集。

EG:

例如:

curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

and then keep requesting as per the documentation link above suggests.

然后根据上面的文档链接继续请求。

EDIT: scan Deprecated in 2.1.0.

编辑:扫描在2.1.0中弃用。

scan does not provide any benefits over a regular scroll request sorted by _doc. link to elastic docs (spotted by @christophe-roussy)

扫描不会对按_doc排序的常规滚动请求提供任何好处。链接到弹性文档(由@christophe-roussy发现)

#2

http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
                                   ^

Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.

注意大小param，它增加了从默认值(10)到1000 / shard显示的点击量。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html

#3

elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.

弹性搜索(ES)支持GET或POST请求，以获取来自ES集群索引的数据。

When we do a GET:

当我们做一个GET:

http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*

When we do a POST:

当我们做一个帖子的时候:

http://localhost:9200/[your_index_name]/_search
{
  "size": [your value] //default 10
  "from": [your start index] //default 0
  "query":
   {
    "match_all": {}
   }
}

I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/ This will help you get a better feeling of the indices you create and also test your indices.

我建议使用一个带有弹性搜索的UI插件http://mobz.github。io/弹性搜索头/这将帮助您更好地感受您创建的索引，并测试您的索引。

#4

The query below would return the NO_OF_RESULTS you would like to be returned..

下面的查询将返回您想要返回的NO_OF_RESULTS。

curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
    "match_all" : {}
  }
}'

Now, the question here is that you want all the records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.

现在的问题是，您希望所有的记录都返回。因此，在编写查询之前，您不会知道NO_OF_RESULTS的值。

How do we know how many records exist in your document? Simply type the query below

我们如何知道文档中有多少记录?简单地键入下面的查询。

curl -XGET 'localhost:9200/foo/_search' -d '

This would give you a result that looks like the one below

这会给你一个看起来像下面这个的结果。

 {
hits" : {
  "total" :       2357,
  "hits" : [
    {
      ..................

The result total tells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS

结果total会告诉您文档中有多少记录。这是一个很好的方法来了解NO_OF结果的值。

curl -XGET 'localhost:9200/_search' -d '

Search all types in all indices

在所有索引中搜索所有类型。

curl -XGET 'localhost:9200/foo/_search' -d '

Search all types in the foo index

搜索foo索引中的所有类型。

curl -XGET 'localhost:9200/foo1,foo2/_search' -d '

Search all types in the foo1 and foo2 indices

在foo1和foo2索引中搜索所有类型。

curl -XGET 'localhost:9200/f*/_search

Search all types in any indices beginning with f

从f开头的任何索引中搜索所有类型。

curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '

Search types user and tweet in all indices

在所有索引中搜索类型用户和tweet。

#5

This is the best solution I found using python client

这是我使用python客户机找到的最佳解决方案。

  # Initialize the scroll
  page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    })
  sid = page['_scroll_id']
  scroll_size = page['hits']['total']

  # Start scrolling
  while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    # Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

Using java client

使用java客户端

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

#6

use server:9200/_stats also to get statistics about all your aliases.. like size and number of elements per alias, that's very useful and provides helpful information

使用服务器:9200/_stats还可以获得关于所有别名的统计信息。就像每个别名的大小和数量一样，这是非常有用的，并且提供了有用的信息。

#7

Simple! You can use size and from parameter!

简单!您可以使用大小和参数!

http://localhost:9200/[your index name]/_search?size=1000&from=0

then you change the from gradually until you get all of the data.

然后你会逐渐改变，直到你得到所有的数据。

#8

The best way to adjust the size is using size=number in front of the URL

调整大小的最好方法是在URL前面使用size=number。

Curl -XGET "http://localhost:9200/logstash-*/_search?size=50&pretty"

Note: maximum value which can be defined in this size is 10000. For any value above ten thousand it expects you to use scroll function which would minimise any chances of impacts to performance.

注意:这个大小可以定义的最大值是10000。对于任何超过1万的值，它期望您使用滚动功能，这将最小化对性能的影响。

#9

Elasticsearch will get significant slower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids.

如果你只是添加一些大的数字，那么弹性搜索将会变得更慢，一个用来获取所有文档的方法是使用扫描和滚动id。

So your call would be:

所以你的电话是:

GET /foo/_search?search_type=scan&scroll=1m
{
    "query": { "match_all": {}},
    "size":  1000
}

This will return a _scroll_id, which you can now use to get the first batch of documents.

这将返回一个_scroll_id，您现在可以使用它来获得第一批文档。

https://www.elastic.co/guide/en/elasticsearch/guide/current/scan-scroll.html

#10

http://localhost:9200/foo/_search/?size=1000&pretty=1

http://localhost:9200 / foo / _search / ?大小= 1000秀美= 1

you will need to specify size query parameter as the default is 10

您将需要指定size查询参数，因为默认值是10。

#11

You can use the _count API to get the value for the size parameter:

您可以使用_count API来获取size参数的值:

http://localhost:9200/foo/_count?q=

Returns {count:X, ...}. Extract value 'X' and then do the actual query:

返回{计数:X,…}。提取值'X'，然后执行实际查询:

http://localhost:9200/foo/_search?q=&size=X

#12

A few of them gave the right answer of using scan and scroll, apparently, I could not a complete answer which would magically work. When someone wants to pull of records then one has to run following curl command.

他们中的一些人给出了正确的使用扫描和滚动的答案，显然，我无法给出一个完整的答案。当有人想要拉记录时，就必须使用curl命令。

curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
    "query": {
            "match_all" : {}
    }
}
'

But we are not done here. The output of the above curl command would be something like this

但我们还没有完成。上面的curl命令的输出是这样的。

{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}

its important to have _scroll_id handy as the very next you shd run the following command

很重要的一点是，将_scroll_id作为接下来的shd运行以下命令。

    curl -XGET  'localhost:9200/_search/scroll'  -d'
    {
        "scroll" : "1m", 
        "scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1" 
    }
    '

However, I dont think its easy to run this manually. Your best bet is to write a java code to do the same.

但是，我认为手动运行它并不容易。最好的方法是编写一个java代码来完成同样的工作。

    private TransportClient client = null;
    private Settings settings = ImmutableSettings.settingsBuilder()
                  .put(CLUSTER_NAME,"cluster-test").build();
    private SearchResponse scrollResp  = null;

    this.client = new TransportClient(settings);
    this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));

    QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
    scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
                 .setScroll(new TimeValue(60000))                            
                 .setQuery(queryBuilder)
                 .setSize(100).execute().actionGet();

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
                .setScroll(new TimeValue(timeVal))
                .execute()
                .actionGet();

Now LOOP on the last command use SearchResponse to extract the data.

现在，在最后一个命令中循环使用SearchResponse来提取数据。

#13

size param increases the hits displayed from from the default(10) to 500.

大小param增加了从默认值(10)到500的显示值。

http://localhost:9200/[indexName]/_search?pretty=true&size=500&q=*:*

Change the from step by step to get all the data.

一步一步地更改，以获取所有数据。

http://localhost:9200/[indexName]/_search?size=500&from=0

#14

To return all records from all indices you can do:

要返回所有指标的所有记录，你可以做:

curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty

旋度xget http://35.195.120.21:9200 _all / _search ? = 50秀美的大小

Output:

输出:

  "took" : 866,
  "timed_out" : false,
  "_shards" : {
    "total" : 25,
    "successful" : 25,
    "failed" : 0
  },
  "hits" : {
    "total" : 512034694,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "grafana-dash",
      "_type" : "dashboard",
      "_id" : "test",
      "_score" : 1.0,
       ...

#15

curl -XGET '{{IP/localhost}}:9200/{{Index name}}/{{type}}/_search?scroll=10m&pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
}}'

#16

-2

You can use size=0 this will return you all the documents example

您可以使用size=0，这将返回所有的文档示例。

curl -XGET 'localhost:9200/index/type/_search' -d '
{
   size:0,
   "query" : {
   "match_all" : {}
    }
}'

推荐阅读

string
PHP 对象生命周期与内存管理

本文详细介绍了 PHP 中对象的生命周期、内存管理和魔术方法的使用，包括对象的自动销毁、析构函数的作用以及各种魔术方法的具体应用场景。 ... [详细]

蜡笔小新 2024-11-12 13:35:26
go
您的数据库配置是否安全？DBSAT工具助您一臂之力！

本文探讨了Oracle提供的免费工具DBSAT，该工具能够有效协助用户检测和优化数据库配置的安全性。通过全面的分析和报告，DBSAT帮助用户识别潜在的安全漏洞，并提供针对性的改进建议，确保数据库系统的稳定性和安全性。 ... [详细]

蜡笔小新 2024-11-11 14:44:47
go
如何更有效地提升对支持部门的协助与支撑？ - Enhancing Support for the Support Department: Strategies and Best Practices

尽管我们尽最大努力，任何软件开发过程中都难免会出现缺陷。为了更有效地提升对支持部门的协助与支撑，本文探讨了多种策略和最佳实践，旨在通过改进沟通、增强培训和支持流程来减少这些缺陷的影响，并提高整体服务质量和客户满意度。 ... [详细]

蜡笔小新 2024-11-07 06:55:33
string
elasticsearch Exists Query

ExistsQueryeditExistsQueryeditExistsQueryeditExistsQueryeditReturnsdocumentsthathaveatleas ... [详细]

蜡笔小新 2024-09-27 19:16:48
spring
ElasticSerach初探第一篇认识ES+环境搭建+简单MySQL数据同步+SpringBoot整合ES

一、认识ElasticSearch是一个基于Lucene的开源搜索引擎，通过简单的RESTfulAPI来隐藏Lucene的复杂性。全文搜索，分析系统&# ... [详细]

蜡笔小新 2023-12-09 10:36:06
match
Yii2 Elasticsearch: 确保 GET /_nodes 请求的正确性与安全性

在安装并配置了Elasticsearch后，我在尝试通过GET /_nodes请求获取节点信息时遇到了问题，收到了错误消息。为了确保请求的正确性和安全性，我需要进一步排查配置和网络设置，以确保Elasticsearch集群能够正常响应。此外，还需要检查安全设置，如防火墙规则和认证机制，以防止未经授权的访问。 ... [详细]

蜡笔小新 2024-11-08 15:16:44
go
深入解析 Android 中 EditText 的 getLayoutParams 方法及其代码应用实例

深入解析 Android 中 EditText 的 getLayoutParams 方法及其代码应用实例 ... [详细]

蜡笔小新 2024-11-07 20:50:46
string
Understanding the Concept and Usage of Null in Java Programming

https:www.hollischuang.comarchives74 对于Java程序员来说，null是令人头痛的东西。时常会受到空指针异常（NPE ... [详细]

蜡笔小新 2024-10-22 12:45:48
go
倒排列表压缩算法汇总——分区EliasFano编码貌似是最牛叉的啊！

来看看倒排索引压缩。压缩是拿CPU换IO的最重要手段之一，不论索引是放在硬盘还是内存中。索引压缩的算法有几十种，跟文本压缩不同，索引压缩算法不仅仅需要考虑压缩率，更要考虑压缩和解压 ... [详细]

蜡笔小新 2024-10-17 12:32:49
string
Lucene 4.2.1入门教程之查询构造

为什么80%的码农都做不了架构师？本文介绍了Lucene查询构造的几种方法。1.查询方式简介查询构造的方法主要有两种，第一种是Query,另外一种 ... [详细]

蜡笔小新 2024-10-12 00:33:43
ip
datetime 索引_【免费毕设】ASP.NET基于Ajax+Lucene构建搜索引擎的设计和实现(源代码+论文)...

点击上方“蓝字”关注我们目录系统设计4.1搜索引擎模型模型包括爬虫、索引生成、查询以及系统配置部分。爬虫包括：网页抓取模块、网页减肥模块、爬虫维持模块。索引生成包括& ... [详细]

蜡笔小新 2024-10-09 12:30:45
string
C++ pimpl机制详细讲解

PIMPL 是 C++ 中的一个编程技巧，意思为指向实现的指针。具体操作是把类的实现细节放到一个单独的类中，并用一个指针进行访问 ... [详细]

蜡笔小新 2024-09-30 15:31:40
string
C#的Type对象的简单应用

通过Type对象可以获取类中所有的公有成员直接贴代码：classMyClass{privatestringname;privateintid;publicstringcity;pu ... [详细]

蜡笔小新 2024-09-29 16:02:26
go
每天收获一点点Hadoop概述

一、Hadoop来历Hadoop的思想来源于Google在做搜索引擎的时候出现一个很大的问题就是这么多网页我如何才能以最快的速度来搜索到，由于这个问题Google发明 ... [详细]

蜡笔小新 2023-12-14 18:58:01
string
部署solr建立nutch索引

2019独角兽企业重金招聘Python工程师标准接着上篇nutch1.4的部署应用，我们来部署一下solr，solr是对lucene进行了封装的企 ... [详细]

蜡笔小新 2023-10-16 18:06:09

技术交流

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章