Big Data/Analytics Zone is brought to you in partnership with:

I started my software adventure with GWBasic programming language. I met with Visual Basic language after QuicBasic and I developed many applications with it until 2000. I stepped into the world of web with PHP. After that, my path crossed with Java! I have been developing enterprise applications with Java EE Technologies since 2005. JavaServer Faces and Spring frameworks are in my area of expertise. I’m trying to specialize in NoSQL Technologies. Hüseyin has posted 15 posts at DZone. You can read more from them at their website. View Full User Profile

Introduction to ElasticSearch

09.16.2013
| 4097 views |
  • submit to reddit
ElasticSearch is an open source tool developed with Java. It is a Lucene-based, scalable, full-text search engine, and a data analysis tool. 

A huge amount of data is produced at any moment in today’s world of information technology in social media, in video sharing sites, and in medium- and large-sized companies that provide services in communication, health, security and other areas. Here we are talking about an information/data ocean; we call this ocean big data in the world of information technology. A significant part of this data is unstructured, scattered and insignificant when it is alone. 

For this reason, some requirements of this data are at stake, such as recording, accessing, analyzing and processing the data. Like similar search engines, ElasticSearch is a tool developed for dealing with the problems of big data mentioned above.

ElasticSearch is powerful and flexible, and being real-time and distributed are some of its biggest advantages. Today, ElasticSearch is used for content search, data analysis, and queries in projects such as Mozilla, Foursquare, GitHub

In order to explore ElasticSearch, we will have a closer look at its basic features and concepts. 

logo

Full-Text Search

When the data stored in a database grows, speed/performance problems occur in query operations that are performed on the data. To remedy this, a method of indexing and cataloging the words in the text fields has been adopted. In this way, it is shown that databases respond faster and show better performance, even when working with large-scale data. Multi-language support of ElasticSearch provides powerful full-text search capabilities such as a powerful query language and auto-completion. 

Index

ElasticSearch is a document-oriented search engine. Each record in ElasticSearch is a structured JSON document. In other words, data that is sent to ElasticSearch for indexing is a JSON document. All fields of the documents are indexed by default and can be used in a single query. 

ElasticSearch indices, compared to database management systems, may be considered databases. As a database is a collection of regular information, ElasticSearch indices are collections of structured JSON documents. 

Type

Types can be considered tables, again compared to database management systems. Indices may contain one or more types. 

Mapping

Mapping is the process of defining how a document should be mapped to the search engine. Types are created according to the mapping information. ElasticSearch creates mapping automatically (explicit mapping) based on the data sent (for example, string, integer, double, boolean). You can override the default mapping by defining a new mapping. 

RESTful API

ElasticSearch is driven by RESTful API. Almost every action can be performed with RESTful API by using JSON through HTTP.

How To Install?

Installation is composed of downloading the latest ElasticSearch distribution, unzipping, and running the executable file appropriate to your operating system. 

For Unix systems:

bin/elasticsearch -f

For Windows:
bin/elasticsearch.bat

If the request is made from the terminal:
curl -X GET http://localhost:9200/

Or if the request http://localhost:9200/ made from the browser gives an output like the following, it means that the service is running as expected and we are ready to work with ElasticSearch. 

hakdogan:elasticsearch hakdogan$ curl -X GET http://localhost:9200/
{
  "ok" : true,
  "status" : 200,
  "name" : "Venom",
  "version" : {
    "number" : "0.90.2",
    "snapshot_build" : false,
    "lucene_version" : "4.3.1"
  },
  "tagline" : "You Know, for Search"
}

ElasticSearch is schema free. It does not request some definitions such as index, type and field type before the indexing process. When a record is added, ElasticSearch tries to identify the data structure and index, and make it searchable. If desired, index, type, field and field type definitions can be changed before or after the adding record. 

It is important to understand the flexibility provided by ElasticSearch. Containing documents that have areas with different types, names and numbers in the same index is undoubtedly a plus. For example, it is a requirement to define field and field types in Solr, another popular full-text search engine. When prompted to add a new field, it is necessary to transfer all records to the Solr again. ElastichSearch does not have such restrictions, and this situation can be compared to table and column independence provided by NoSQL architectures.

Let’s experience this flexibility by adding our first record to ElasticSearch. I mentioned above that ElasticSearch is driven by RESTful API. For this reason, we will make the record-adding process by using curl(client URL library). 

hakdogan:~ hakdogan$ curl -XPUT localhost:9200/kodcucom/article/1 -d '
> {
> title: "ElasticSearch",
> content: "ElasticSearch is developed in Java, open source, lucene-based, scalable full-text search engine and data analysis tool.",
> date: "2013-08-05T12:00:00",
> author: "Hüseyin Akdoğan"
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":1}

We have added our first record to ElasticSearch with the above command. Now, let’s run the following command, then request the added record and look at the produced output. 

hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/1?pretty=true
{
  "_index" : "kodcucom",
  "_type" : "article",
  "_id" : "1",
  "_version" : 1,
  "exists" : true, "_source" : 
{
title: "ElasticSearch",
content: "ElasticSearch is developed in Java, open source, lucene-based, scalable full-text search engine and data analysis tool.",
date: "2013-08-05T12:00:00",
author: "Hüseyin Akdoğan"
}
}

As we can see, despite not creating an index called kodcucom and a type called article before the registration, ElasticSearch made all of these in standard settings based on the added record.

The record (JSON document) which has an ID value 1’s type is article and index is kodcucom.

Cluster

ElasticSearch has been built to scale horizontally. If more capacity is needed, it is sufficient to increase the number of nodes. In this case, the cluster will reorganize itself in order to take advantage of extra hardware.

The standard ElasticSearch installations have the same cluster name and, regardless of the number, find and connect to each other automatically in the same network. ElasticSearch configuration files are located in the ElasticSearchHomeDirectory/config folder. The corresponding row in the elasticsearch.yml file must be arranged for the cluster name. 

################################### Cluster ###################################
# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
# cluster.name: elasticsearch

Client Support

Client support is available for many platforms such as Java, PHP, Python, Perl, Ruby, and .NET. Check out the full list here: Clients.

Conclusion

ElasticSearch is quite elastic among its peers in terms of both configuration and usage, and it is also an attractive option for systems working with big data that may result in I/O bottlenecks because of search operations and data analysis. 

I hope to handle some issues in future articles such as basic CRUD operations with ElasticSearch, provided Java API, and the usage of this useful tool in a web project. 
Published at DZone with permission of its author, Hüseyin Akdoğan. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)