I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...
我有一个小的数据库在弹性搜索和测试的目的是想把所有的记录拉回来。我正在尝试使用表单的URL…
http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}
Can someone give me the URL you would use to accomplish this, please?
有人能给我你要用的URL来完成这个吗?
507
I think lucene syntax is supported so:
我认为lucene语法是支持的:
http://localhost:9200/foo/_search?pretty=true&q=*:*
http://localhost:9200 / foo / _search ?漂亮=真的,q = *:*
size defaults to 10, so you may also need &size=BIGNUMBER
to get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)
大小默认为10,因此您可能还需要&size=BIGNUMBER来获得10个以上的项。(BIGNUMBER等于一个你认为比你的数据集大的数字)
BUT, elasticsearch documentation suggests for large result sets, using the scan search type.
但是,使用扫描搜索类型,弹性搜索文档建议使用大型结果集。
EG:
例如:
curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
"query" : {
"match_all" : {}
}
}'
and then keep requesting as per the documentation link above suggests.
然后根据上面的文档链接继续请求。
EDIT: scan
Deprecated in 2.1.0.
编辑:扫描在2.1.0中弃用。
scan
does not provide any benefits over a regular scroll
request sorted by _doc
. link to elastic docs (spotted by @christophe-roussy)
扫描不会对按_doc排序的常规滚动请求提供任何好处。链接到弹性文档(由@christophe-roussy发现)
99
http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
^
Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.
注意大小param,它增加了从默认值(10)到1000 / shard显示的点击量。
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
22
elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.
弹性搜索(ES)支持GET或POST请求,以获取来自ES集群索引的数据。
When we do a GET:
当我们做一个GET:
http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*
When we do a POST:
当我们做一个帖子的时候:
http://localhost:9200/[your_index_name]/_search
{
"size": [your value] //default 10
"from": [your start index] //default 0
"query":
{
"match_all": {}
}
}
I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/ This will help you get a better feeling of the indices you create and also test your indices.
我建议使用一个带有弹性搜索的UI插件http://mobz.github。io/弹性搜索头/这将帮助您更好地感受您创建的索引,并测试您的索引。
15
The query below would return the NO_OF_RESULTS you would like to be returned..
下面的查询将返回您想要返回的NO_OF_RESULTS。
curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
"match_all" : {}
}
}'
Now, the question here is that you want all the records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.
现在的问题是,您希望所有的记录都返回。因此,在编写查询之前,您不会知道NO_OF_RESULTS的值。
How do we know how many records exist in your document? Simply type the query below
我们如何知道文档中有多少记录?简单地键入下面的查询。
curl -XGET 'localhost:9200/foo/_search' -d '
This would give you a result that looks like the one below
这会给你一个看起来像下面这个的结果。
{
hits" : {
"total" : 2357,
"hits" : [
{
..................
The result total tells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS
结果total会告诉您文档中有多少记录。这是一个很好的方法来了解NO_OF结果的值。
curl -XGET 'localhost:9200/_search' -d '
Search all types in all indices
在所有索引中搜索所有类型。
curl -XGET 'localhost:9200/foo/_search' -d '
Search all types in the foo index
搜索foo索引中的所有类型。
curl -XGET 'localhost:9200/foo1,foo2/_search' -d '
Search all types in the foo1 and foo2 indices
在foo1和foo2索引中搜索所有类型。
curl -XGET 'localhost:9200/f*/_search
Search all types in any indices beginning with f
从f开头的任何索引中搜索所有类型。
curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '
Search types user and tweet in all indices
在所有索引中搜索类型用户和tweet。
11
This is the best solution I found using python client
这是我使用python客户机找到的最佳解决方案。
# Initialize the scroll
page = es.search(
index = 'yourIndex',
doc_type = 'yourType',
scroll = '2m',
search_type = 'scan',
size = 1000,
body = {
# Your query's body
})
sid = page['_scroll_id']
scroll_size = page['hits']['total']
# Start scrolling
while (scroll_size > 0):
print "Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
# Update the scroll ID
sid = page['_scroll_id']
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
print "scroll size: " + str(scroll_size)
# Do something with the obtained page
https://gist.github.com/drorata/146ce50807d16fd4a6aa
https://gist.github.com/drorata/146ce50807d16fd4a6aa
Using java client
使用java客户端
import static org.elasticsearch.index.query.QueryBuilders.*;
QueryBuilder qb = termQuery("multi", "test");
SearchResponse scrollResp = client.prepareSearch(test)
.addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
for (SearchHit hit : scrollResp.getHits().getHits()) {
//Handle the hit...
}
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html
10
use server:9200/_stats
also to get statistics about all your aliases.. like size and number of elements per alias, that's very useful and provides helpful information
使用服务器:9200/_stats还可以获得关于所有别名的统计信息。就像每个别名的大小和数量一样,这是非常有用的,并且提供了有用的信息。
8
Simple! You can use size
and from
parameter!
简单!您可以使用大小和参数!
http://localhost:9200/[your index name]/_search?size=1000&from=0
then you change the from
gradually until you get all of the data.
然后你会逐渐改变,直到你得到所有的数据。
5
The best way to adjust the size is using size=number in front of the URL
调整大小的最好方法是在URL前面使用size=number。
Curl -XGET "http://localhost:9200/logstash-*/_search?size=50&pretty"
Note: maximum value which can be defined in this size is 10000. For any value above ten thousand it expects you to use scroll function which would minimise any chances of impacts to performance.
注意:这个大小可以定义的最大值是10000。对于任何超过1万的值,它期望您使用滚动功能,这将最小化对性能的影响。
4
Elasticsearch will get significant slower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids.
如果你只是添加一些大的数字,那么弹性搜索将会变得更慢,一个用来获取所有文档的方法是使用扫描和滚动id。
So your call would be:
所以你的电话是:
GET /foo/_search?search_type=scan&scroll=1m
{
"query": { "match_all": {}},
"size": 1000
}
This will return a _scroll_id, which you can now use to get the first batch of documents.
这将返回一个_scroll_id,您现在可以使用它来获得第一批文档。
https://www.elastic.co/guide/en/elasticsearch/guide/current/scan-scroll.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/scan-scroll.html
4
http://localhost:9200/foo/_search/?size=1000&pretty=1
http://localhost:9200 / foo / _search / ?大小= 1000秀美= 1
you will need to specify size query parameter as the default is 10
您将需要指定size查询参数,因为默认值是10。
4
You can use the _count
API to get the value for the size
parameter:
您可以使用_count API来获取size参数的值:
http://localhost:9200/foo/_count?q=
Returns {count:X, ...}
. Extract value 'X' and then do the actual query:
返回{计数:X,…}。提取值'X',然后执行实际查询:
http://localhost:9200/foo/_search?q=&size=X
2
A few of them gave the right answer of using scan and scroll, apparently, I could not a complete answer which would magically work. When someone wants to pull of records then one has to run following curl command.
他们中的一些人给出了正确的使用扫描和滚动的答案,显然,我无法给出一个完整的答案。当有人想要拉记录时,就必须使用curl命令。
curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
"query": {
"match_all" : {}
}
}
'
But we are not done here. The output of the above curl command would be something like this
但我们还没有完成。上面的curl命令的输出是这样的。
{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}
its important to have _scroll_id handy as the very next you shd run the following command
很重要的一点是,将_scroll_id作为接下来的shd运行以下命令。
curl -XGET 'localhost:9200/_search/scroll' -d'
{
"scroll" : "1m",
"scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1"
}
'
However, I dont think its easy to run this manually. Your best bet is to write a java code to do the same.
但是,我认为手动运行它并不容易。最好的方法是编写一个java代码来完成同样的工作。
private TransportClient client = null;
private Settings settings = ImmutableSettings.settingsBuilder()
.put(CLUSTER_NAME,"cluster-test").build();
private SearchResponse scrollResp = null;
this.client = new TransportClient(settings);
this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));
QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000))
.setQuery(queryBuilder)
.setSize(100).execute().actionGet();
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
.setScroll(new TimeValue(timeVal))
.execute()
.actionGet();
Now LOOP on the last command use SearchResponse to extract the data.
现在,在最后一个命令中循环使用SearchResponse来提取数据。
1
size param increases the hits displayed from from the default(10) to 500.
大小param增加了从默认值(10)到500的显示值。
http://localhost:9200/[indexName]/_search?pretty=true&size=500&q=*:*
Change the from step by step to get all the data.
一步一步地更改,以获取所有数据。
http://localhost:9200/[indexName]/_search?size=500&from=0
0
To return all records from all indices you can do:
要返回所有指标的所有记录,你可以做:
curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty
旋度xget http://35.195.120.21:9200 _all / _search ? = 50秀美的大小
Output:
输出:
"took" : 866,
"timed_out" : false,
"_shards" : {
"total" : 25,
"successful" : 25,
"failed" : 0
},
"hits" : {
"total" : 512034694,
"max_score" : 1.0,
"hits" : [ {
"_index" : "grafana-dash",
"_type" : "dashboard",
"_id" : "test",
"_score" : 1.0,
...
0
curl -XGET '{{IP/localhost}}:9200/{{Index name}}/{{type}}/_search?scroll=10m&pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
}}'
-2
You can use size=0 this will return you all the documents example
您可以使用size=0,这将返回所有的文档示例。
curl -XGET 'localhost:9200/index/type/_search' -d '
{
size:0,
"query" : {
"match_all" : {}
}
}'