How to Get All Results from Elasticsearch in Python

elasticsearch get all documents
python elasticsearch scan example
python elasticsearch get mapping
how to get all documents in elasticsearch python
elastic search query get all results
elasticsearch scroll
python elasticsearch get number of documents in index
python elasticsearch create index

I am brand new to using Elasticsearch and I'm having an issue getting all results back when I run an Elasticsearch query through my Python script. My goal is to query an index ("my_index" below), take those results, and put them into a pandas DataFrame which goes through a Django app and eventually ends up in a Word document.

My code is:

es = Elasticsearch()
logs_index = "my_index"
logs = es.search(index=logs_index,body=my_query)

and it tells me I have 72 hits, but then when I do:

df = logs['hits']['hits']
len(df)

It says the length is only 10. I saw someone had a similar issue on this question but their solution did not work for me.

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
logs_index = "my_index"
search = Search(using=es)
total = search.count()
search = search[0:total]
logs = es.search(index=logs_index,body=my_query)
len(logs['hits']['hits'])

The len function still says I only have 10 results. What am I doing wrong, or what else can I do to get all 72 results back?

ETA: I am aware that I can just add "size": 10000 to my query to stop it from truncating to just 10, but since the user will be entering their search query I need to find another way that isn't just in the search query.

You need to pass a size parameter to your es.search() call.

Please read the API Docs

size – Number of hits to return (default: 10)

An example:

es.search(index=logs_index, body=my_query, size=1000)

Please note that this is not an optimal way to get all index documents or a query that returns a lot of documents. For that you should do a scroll operation which is also documented in the API Docs provided under the scan() abstraction for scroll Elastic Operation.

You can also read about it in elasticsearch documentation

How to get all results back from search? · Issue #737 · elastic , client = Elasticsearch(['http://nightly.apinf.io:14002']) >>> search = Search(using=​client) How do I get back all 9611 search results? specifying from elasticsearch_dsl import Search This is a generic python question, please  One of the option for querying Elasticsearch from Python is to create the REST calls for the search API and process the results afterwards. The requests library is particularly easy to use for this purpose. We can install it with: sudo pip install requests

Either you should set the size explicitly(if the number of documents is relatively small) or user the scan function to have a cursor like for large number of documents.

Scan

How to use Python to Make Scroll Queries to Get All Documents in , Is it possible to get all the documents from an index? I tried it with python and requests but always get query_phase_execution_exception"  Get all documents in an Elasticsearch index using the match_all search parameter In our next example, we’ll create a query to get all the documents in a particular index. We’ll use the Elasticsearch "match_all" option in the Python dictionary query to accomplish this. The example below has the query passed into the method call directly.

It is also possible to use the elasticsearch_dsl (link) library:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd

client = Elasticsearch()
s = Search(using=client, index="my_index")

df = pd.DataFrame([hit.to_dict() for hit in s.scan()])

The secret here is s.scan() which handles pagination and queries the entire index.

Note that the example above will return the entire index since it was not passed any query. To create a query with elasticsearch_dsl check this link.

Get all documents from an index - Elasticsearch, makes should use the POST HTTP verb. The second argument, “https://localhost:9200/_search” is the URL that the request should be made to. By default, the search only returns 10 results. If you want all the results you need to request it, and the simplest way is to slice from zero to the total number of hits (i.e. search.count()). If you only want the first 100 hits, you would slice it like this: search[0:100].

How To Return All Documents From An Index In Elasticsearch , How do I create an index in Elasticsearch using Python? Accessing ElasticSearch in Python. To be honest, the REST APIs of ES is good enough that you can use requests library to perform all your tasks. Still, you may use a Python library for ElasticSearch to focus on your main tasks instead of worrying about how to create requests. Install it via pip and then you can access it in your Python programs.

List all indexes on ElasticSearch server?, How can I get all these results at once (i.e., without using scroll)? @​Andrei_Stefan Please see my Python code for querying ES using  Elasticsearch query to return all records. Ask Question Asked 8 years, 5 months ago. Active 2 months ago. Viewed 636k times 490. 115. I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting t

Curl Syntax In Elasticsearch With Examples, The most simple query, which matches all documents, giving them all a _score of 1.0 . GET /_search { "query":  script - python elasticsearch get all documents . Creating DataFrame from ElasticSearch Results (3) I am trying to build a DataFrame in pandas, using the results of a very basic query to ElasticSearch. I am getting the Data I need, but its a matter of sl

Comments
  • can you please clarify your last edit? I'm not sure what the search query has to do with the size parameter. Are you referring to the problem of not knowing how many results are being returned by the query VS the static size you might define?
  • since it's your first post, please read this so you know how to react to answers: stackoverflow.com/help/someone-answers
  • I thought the size could only be within my_query, thank you for clarifying! I know it's not the best practice, but I need it to just work for now and I can look into scroll later. Thank you!
  • If for some reason you need to implement a basic and not adviseable client side scroll, you can also use the from parameter that defines the start of your result offset (which lets you paginate results basically).
  • @AlexandreJuma is it possible to add from as I am trying to add it and my python is giving me SyntaxError probably because from is reserved keyword in python
  • @thakurinbox, for compatibility with the Python ecosystem we use from_ instead of from and doc_type instead of type as parameter names