Python for (some) Elasticsearch queries
This post will be a quick round of the most common ES queries to be run via the low-level Python client Elasticsearch.
Assuming you have an Elasticsearch cluster somewhere, either locally or remotely, you’d use the client to connect to it as (here we are grabbing the remote URL via environment variable and passing it to the constructor, if we don’t pass anything it will connect to a local instance):
from elasticsearch import Elasticsearch
from os import environ
ES_cluster_URL = environ['ES_CLUSTER_URL']
es_client = Elasticsearch() # local
es_client = Elasticsearch([ES_cluster_URL]) # remote
and then you’d build the prototypical query body as
body = {
"from": 10, # get docs from the number 10
"size": 100, # get 100 docs (default = 10)
"fields": ["wanted_field"], # get only wanted fields
"query": { # the query
"term": {
}
},
"sort": { # to sort
"date_field": {
"order": "desc"
}
}
}
while running the query as
r = es_client.search(index='myindex',
doc_type='mytype',
body=body)
By exploring the structure of r you’d find what you need (the structure of what you get back will change based on the type of query you run).
Let’s see how to do some specific/commonplace queries by tweaking the body object.
A term query
You run a term query when you want to retrieve all documents matching a field conditions.
body = {
"query": {
"term": {
"your_field": "needed_value"
}
},
}
A range query
body = {
"query": {
"range": {
"date_field": {
"gte": start_date,
"lt": final_date
}
}
}
}
Here, start_date
and final_date
are datetime objects, gt
and lt
mean “greater than” and “less than” respectively and the “e” will signify that the interval is closed.
A bool query
To perform an AND, you need to run a so-called bool query, which can be used for all sorts of logical queries, but here I give the prototype of an AND.
body = {
"query": {
"bool": {
"must": [
{
"term": {
"field1": "value1"
}
},
{
"term": {
"field2": "value2"
}
}
]
}
}
}
Here we’re asking for all documents where field1
matches “value1” AND field2
matches “value2”. In a similar way, we could use a must_not
keyword to mean that we want documents who do not match a given value. There are also other types of keywords one can use depending on the use case.
Aggregating data
Many are the situations where you need to aggregate the documents on a field. A prototype to obtain this would be (we are aggregating on field called my_field
):
body = {
"size": 0,
"aggs": {
"my_field_agg": {
"terms": {
"size": 100,
"field": "my_field"
}
}
}
}
The parameter size
in the aggregation has to be tweaked to make sure the returned sum_other_doc_count
in r is 0, otherwise it means not all documents have been aggregated.
All this I’ve reported in a Gist here.