Big Data/Analytics Zone is brought to you in partnership with:

I started my software adventure with GWBasic programming language. I met with Visual Basic language after QuicBasic and I developed many applications with it until 2000. I stepped into the world of web with PHP. After that, my path crossed with Java! I have been developing enterprise applications with Java EE Technologies since 2005. JavaServer Faces and Spring frameworks are in my area of expertise. I’m trying to specialize in NoSQL Technologies. Hüseyin has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

The Core ElasticSearch Operations

09.24.2013
| 2992 views |
  • submit to reddit
As established in our previous article, which introduced ElasticSearch, ElasticSearch is driven by RESTful API. Almost every action taken with RESTful API can be performed using JSON through HTTP. This hypothetical example performs basic ElasticSearch operations using cURL (Client URL Request Library) to index articles published on kodcu.com by title, content, date of publishing, tag, and author information.

Creating an Index

The Create Index API provides the ability to instantiate an index. ElasticSearch also supports multiple indices and the execution of transactions between indices. Custom settings for each created index can also be provided. 

hakdogan:~ hakdogan$ curl -XPUT 'http://localhost:9200/kodcucom/' -d '
> index:
> number_of_shards: 2
> number_of_replicas: 1
> '

With this command, we created an index by the name of kodcucom by specifying the number of shards and replicas. 

A shard in ElasticSearch is a single Lucene instance. It is managed automatically. An index has five primary shards by default. You can specify the default shard number in the config/elasticsearch.yml file. 

As we saw in the example, this number can be changed, particularly with respect to the created index. The number of primary shards cannot be changed after creating an index. ElasticSearch distributes shards between all nodes in the cluster, and in the cases of a node failure or the addition of a new node, it moves shards automatically.

As stated before, ElasticSearch is able to analyze the sent record and then create index and type information automatically with standard settings.

hakdogan:~ hakdogan$ curl -XPUT localhost:9200/kodcucom/article/1 -d '{
> title: "Java API for JSON Processing - Stream-based JSON Produce and Consume",
> content: "Java API for JSON Processing (JSON-P) standard under the umbrella of the Java EE 7 in the JSR-353 specification is an enterprise java technology.",
> postDate: "2013-08-06T12:00:00",
> tags: ["Java"],
> author: "Rahman Usta"
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":1}

With the above command, an index with the name kodcucom and a type with the name article are being created with standard settings, and a record (JSON document) with an ID value of 1 is stored in ElasticSearch.

Getting a Document

ElasticSearch's Get API allows you to get a document with a specified ID value. 

hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/1?pretty=true
{
  "_index" : "kodcucom",
  "_type" : "article",
  "_id" : "1",
  "_version" : 1,
  "exists" : true, "_source" : {
title: "Java API for JSON Processing - Stream-based JSON Produce and Consume",
content: "Java API for JSON Processing (JSON-P) standard under the umbrella of the Java EE 7 in the JSR-353 specification is an enterprise java technology.",
postDate: "2013-08-06T12:00:00",
tags: ["Java"],
author: "Rahman Usta"}
}

By default, the Get API is real-time and does not affect the refresh rate of the index. You can specify the fields to be fetched while you are getting a document. Returning a set of fields can be achieved by passing a parameter.

hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/1?fields=title,author
{
"_index":"kodcucom",
"_type":"article",
"_id":"1",
"_version":1,
"exists":true,
"fields": {
"author":"Rahman Usta",
"title":"Java API for JSON Processing - Stream-based JSON Produce and Consume"}
}

Getting Multiple Documents

The Multi Get API allows you to get multiple documents based on the index, type (optional) and ID. The response includes a docs array with all the fetched documents. 

hakdogan:~ hakdogan$ curl localhost:9200/_mget -d '{
> docs: [
>          {
>           _index: "kodcucom",
>           _type: "article",
>          _id: "1"
>         },
>        {
>        _index: "kodcucom",
>        _type: "article",
>        _id: "2"
>       }
>            ]
> }'
{
   "docs": [
                {"_index":"kodcucom",
                 "_type":"article",
                 "_id":"1",
                 "_version":1,
                 "exists":true, 
                 "_source" : {
                 title: "Java API for JSON Processing - Stream-based JSON Produce and Consume",
                 content: "Java API for JSON Processing (JSON-P) standard under the umbrella of the Java EE 7 in the JSR-353 specification is an enterprise java technology.",
                 postDate: "2013-08-06T12:00:00",
                 tags: ["Java"],
                 author: "Rahman Usta"}
},
                {
                 "_index":"kodcucom",
                 "_type":"article",
                 "_id":"2",
                 "_version":1,
                 "exists":true, 
                 "_source" : {
                 title: "Core ElasticSearch Operations",
                 content: "Elasticsearch is RESTful API driven",
                 postDate: "2013-08-13T09:00:00",
                 tags: ["elasticsearch, big-data"],
                 author: "Hüseyin Akdoğan"}
}
]
}

The Mget end point can also be used in conjunction with index and type information. 

hakdogan:~ hakdogan$ curl localhost:9200/kodcucom/_mget -d '{
> docs: [
>          {
>           _type: "article",
>          _id: "1"
>         },
>        {
>        _type: "article",
>        _id: "2"
>       }
>        ]
> }'

hakdogan:~ hakdogan$ curl localhost:9200/kodcucom/article/_mget -d '{
> docs: [
>          {
>          _id: "1"
>         },
>        {
>        _id: "2"
>       }
>        ]
> }'

For a simple request, the ids element can be used. 

hakdogan:~ hakdogan$ curl localhost:9200/kodcucom/article/_mget -d '{
> ids: ["1", "2"]
> }'

Let’s see how to fetch specific fields once and for all. 

hakdogan:~ hakdogan$ curl localhost:9200/_mget -d '{
> docs: [
>           {
>            _index: "kodcucom",
>            _type: "article",
>            _id: "1",
>            fields: ["title", "author"]
>          },
>         {
>           _index: "kodcucom",
>           _type: "article",
>           _id: "2",
>          fields: ["postDate", "tags"]
>         }
>        ]
> }'

Searching

The Search API allows you to execute a search query and then get the results that match the query. The search query can either be performed using a simple query string as a parameter, or by using a request body. Below, you can see an example of each use (the example that uses a request body also contains a range query):

hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/_search?fields=title,author

hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/_search -d '{
> query: {range: {postDate: {from: "2013-01-01", to: "2013-08-13"}}}
> }'

Updating

The ElasticSearch Update API supports updating by script update and passing a partial document, which will merge into the existing document. ElasticSearch uses versioning (each indexed document in ElasticSearch is being versioned) in order to be sure about the update procedure. Updating means full reindex of the document. 

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
> script: "ctx._source.tags += tag",
> params: {
> tag: "json-p"}
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":2}

Above, you see a script update. The ElasticSearch scripting module allows you to use scripts in order to evaluate custom expressions. The scripting module uses MVEL by default. With the Lang plug-in, it is also possible to run scripts in different languages, such as JavaScript, Groovy and Python.

Let’s return to the above command again. ctx is a context of the script. With the script context, the tags field is being updated over the _source field of the document. A script can also use a parameter. Pay attention to the usage of params here. A value is set to the tag parameter in the params part. 

You can add a new field to the document by script update. 

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
> script: "ctx._source.temporaryField = \"temporary text\""
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":3}

You can also delete, too.

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
> script: "ctx._source.remove(\"temporaryField\")"
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":4}

Please pay attention to the version information in the produced output. The Update API supports passing a partial document, which will merge into the existing document. 

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
doc: {                                                                        
author: "RAHMAN USTA"                                                                                                                                        
}                               
}'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":5}

A point to be noted; if the doc and script are both specified, then doc is ignored. ElasticSearch also provides support for preloaded scripts. 

Deleting

Before examining the ElasticSearch Delete API, I want to specify that it is possible to delete a document (record) with script depending on the value that a field has. 

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
script: "ctx._source.tags.contains(tag) ? ctx.op = \"delete\" : ctx.op = \"none\"",
params: { tag: "Java"}
}'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":6}hakdogan:~ hakdogan$ 
hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/1?pretty=true
{
  "_index" : "kodcucom",
  "_type" : "article",
  "_id" : "1",
  "exists" : false
}

The Delete API allows you to delete a document whose ID value is specified. 

hakdogan:~ hakdogan$ curl -XDELETE localhost:9200/kodcucom/article/1
{"ok":true,"found":true,"_index":"kodcucom","_type":"article","_id":"1","_version":2}

The Delete Index API allows you to delete an index. 

hakdogan:~ hakdogan$ curl -XDELETE localhost:9200/kodcucom

The Delete Index API can be applied to more than one index by default. 

hakdogan:~ hakdogan$ curl -XDELETE localhost:9200

action.disable_delete_all_indices must be set to true in order to disable the ability to delete all indices with a single command.
Published at DZone with permission of its author, Hüseyin Akdoğan. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)